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£^ 1 Abstract 

This paper considers optimization problems on the Stiefel manifold X T X = I p , where X £ R nxp is 
, the variable and I p is the p-by-p identity matrix. A framework of constraint preserving update schemes is 

proposed by decomposing each feasible point into the range space of X and the null space of X T . While this 
general framework can unify many existing schemes, a new update scheme with low complexity cost is also 
discovered. Then we study a feasible Barzilai-Borwein (BB)-like method under the new update scheme. The 
^3 ! global convergence of the method is established with an adaptive nonmonotone line search. The numerical 

^ (-4 tests on the nearest low-rank correlation matrix problem, the Kohn-Sham total energy minimization and a 

specific problem from statistics demonstrate the efficiency of the new method. In particular, the new method 
performs significantly better than some state-of-the-art algorithms for the nearest low-rank correlation matrix 
problem and is considerably competitive with the widely used SCF iteration for the Kohn-Sham total energy 
minimization. 
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O ■ 1 Introduction 

m i 

In this paper, we consider general feasible methods for optimization on the Stiefel manifold, 

min F(X), s.t. X T X = I p , (1.1) 

^ . x<=w- 

H 

where I p is the p-by-p identity matrix and F(X) : W ixp — > R is a differentiate function. The feasible set 
S n , p := {X e W ixp : X T X = I p } is referred to as the Stiefel manifold, which was due to Stiefel in 1935 [44]. 

Problem (1.1) captures many applications, for instance, nearest low-rank correlation matrix problem [22, 34, 
40], linear eigenvalue problem [20, 41], Kohn-Sham energy minimization [49], orthogonal Procrustes problem 
[16, 43], maximization of sums of heterogeneous quadratic functions from statistics [7, 36], sparse principal 
component analysis [13, 26, 53], leakage interference minimization [29, 33] and joint diagonalization(blind source 
separation) [25, 46]. For other applications, we refer interested readers to [15, 48] and the references therein. 
In general, problem (1.1) is difficult to solve due to the nonconvexity of the orthogonality constraint. In fact, 
some of the above examples, including the maxcut problem and the leakage interference minimization [29], are 
NP-hard. 
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With the wide applicability and fundamental difficulty, problem (1.1) has attracted many researchers. Based 
on the geometric structure, Rapcsak [36, 37] reformulated it as a smooth nonlinear program by introducing a 
new coordinate representation. From the point of view of manifold, some authors proposed a variety of feasible 
algorithms to solve problem (1.1). These algorithms include steepest descent methods [1, 3, 30, 31], Barzilai- 
Borwcin (BB) method [48], conjugate gradient methods [2, 3, 15], trust region methods [3, 51], Newton methods 
[3, 15], quasi- Newton methods [42] and subspacc methods [50]. Unlike the unconstrained case, it is not trivial 
to keep the whole iterations in the Stiefel manifold and the concept of retraction has played an important role 
(see [4, Theorem 15] and [3, Definition 4.1.1] for a detailed description on reactions). We shall classify the 
existing update schemes into two types: geodesic-like and projection-like update schemes. Briefly speaking, 
geodesic-like update schemes preserve the constraint by moving a point along the geodesic or quasi-geodesic 
while projection-like update schemes do so by (approximately) projecting a point into the constraint. We will 
delve into the details of these update schemes in §2.1. 

In this paper, we develop a framework of constraint preserving update schemes based on a novel idea; i.e., 
decomposing each feasible point into the range space of X and the null space of X T . This framework can not 
only unify most existing schemes including gradient projection, Manton's projection, polar decomposition, QR 
factorization and Wen- Yin update schemes (they will be mentioned in §2.1), but also leads to the discovery of 
a new scheme with low complexity cost. 

Under the new update scheme, we look for a suitable descent feasible curve along which the objective 
function can achieve a certain decrease by taking a suitable stcpsizc. Then we can treat the original problem as 
an unconstrained optimization problem. Since the Barzilai-Borwein method proves very useful for large-scale 
nonlinear optimization [6, 8, 9, 10, 17, 39, 52], we consider to combine this method with our new update scheme. 
To ensure the global convergence, we adopt the adaptive nonmonotone line search [8], leading to an adaptive 
feasible Barzilai-Borwein-like (AFBB) method for problem (1.1). We will prove the global convergence of the 
AFBB method in the numerical sense under some mild assumptions. Although our update scheme is also a 
retraction, the convergence of retraction-based line search methods in [3] cannot be applied to our methods. As 
a matter of fact, our convergence analysis is very different from that in [3] due to the nonmonotonicity of the 
method. 

The rest of this paper is organized as follows. In §2, we review the existing update schemes and give the 
first-order optimality condition of problem (1.1). In §3, we introduce our framework of constraint preserving 
update schemes and propose a new update scheme. Some properties of the new update scheme and comparisons 
with existing update schemes are also stated in this section. We present the AFBB method in §4.1, establish 
its global convergence in §4.2 and then discuss how to deal with the feasibility error in the new update scheme 
in §4.3. The AFBB method is also extended to more general problems in §5. Some numerical tests on an 
extensive collection of problems are presented to demonstrate the efficiency of our AFBB method in §6. Finally, 
conclusions are drawn in the last section. 

Notation: For matrix M £ R nxn , we define skew(M) = (M - M T )/2, sym(M) = (M + M T )/2. The 
maximal and minimal eigenvalues of M are denoted by A max (M) and A m i n (M), respectively. We use diag(ill) 
to stand for the vector formed by the diagonal entries of M. Meanwhile, we use Diag(6>i, ■ • • , 9 n ) to represent 
the diagonal matrix whose diagonal entries are 9\, • ■ ■ ,9 n . The set of n-by-n symmetric matrices is denoted by 
S n . For S £ S n , if S is positive semidefinite (positive definite), we mark S >z (S y 0). The Euclidean inner 
product of two matrices A,BG jg>mxn j s defined as (A,B) = tr(A T B), where tr(-) is the trace operator. If 
n > 1, we denote by and A_i the i-th column of A and the matrix obtained by replacing A^ with in A, 

respectively. The condition number of A is defined as cond(j4) = f ^r^r^ ) ■ Denote by e; the i-th unit 
vector of an appropriate size. For X £ 5„ iP , we define Px = I n — \XX T . The gradient of F respect to X is 
G := PJ r (A) = ( d Qx X I ' wnereas the gradient in the tangent space is denoted by VJ-. 
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2 Preliminaries 



2.1 Overview of existing update schemes 

Given any tangent direction D € R nxp satisfying X T D + D T X = with X <G S n , pi we briefly review the 
geodesic-like and projection-like update schemes. Note that the parameter r > in the following update 
schemes is some stepsize. 

Geodesic-like update schemes. In 1998, Edelman et al. [15] proposed a computable geodesic update scheme, 
in which the iterations lie in the curve defined by 



1 geo 



i(t; X) = XM(t) + QN(t), where 




expr( I 1 ~f )( { p ), (2-1) 



R 
XX T )D, A x = X T D. 

This strategy requires computing the exponential of a 2p-by-2p matrix and the QR factorization of an n-hy-p 
matrix at each iteration. Consequently, the flops will be high when p > n/2. Another geodesic approach 
is proposed by Abrudan et al. [1]. Given an n-by-n skew-symmetric matrix they considered the curve 
Y ge0 2(T) = exp{— tA 2 )A. Comparing with (2.1), this formula can efficiently deal with the case when p > n/2. 
Nevertheless, it still requires great efforts to compute the exponential of an n-by-n matrix. To avoid computing 
exponentials of matrices, Nishimori and Akaho [31] proposed a kind of quasi-geodesic approach. Given an n- 
by-n skew-symmetric matrix, still say A 2 , a special case of their update schemes is the Cayley transformation 
scheme 

Y qg (r;X) = (l n - ^ 2 ) _1 (/„ + \A 2 ) X. (2.2) 

For this approach, the computation cost for (2.2) is high especially for small p. In 2010, Wen and Yin [48] 
proposed the Cayley transformation update scheme, also dubbed as Crank-Nicholson-like update scheme, by 
extending their previous update formulation of p-harmonic flow to handle matrices. Using some clever tech- 
niques, they developed a simple constraint preserving update scheme as follows: 

Y wy ( T ;X) = X -Tu(l 2p + ^V T U^j V T X, (2.3) 

where U = [PxD, X], V = [X, —PxD] £ M. nx2-p . For convenience, we call it Wen- Yin update scheme through- 
out this paper. This formula has the lowest computation complexity per iteration among the existing geodesic- 
like approaches when p < n/2. However, when p > n/2, the cost is still expensive. To deal with this case, a 
low-rank update scheme is explored in [48]. 

Projection-like update schemes. In spite of the nonconvexity of the Stiefel manifold, it is possible to preserve 
the constraint by the projection. The projection of a rank p matrix C G M" xp onto <S„ jP is defined as the unique 
solution of 

min \\X-C\\ F , s.t. X T X = I V1 (2.4) 

xeR"><p 

where || ■ \\p is the Frobcnius norm. For any symmetric positive definite matrix B € W xp , denote by B 1 / 2 the 
unique square root of B. It is easy to see that the solution of (2.4) is Vs„ tP (C) = C(C T C)~ 1 / 2 . Then we can 
extend the gradient projection method for optimization with convex constraints for solving (1.1), yielding the 
update scheme 

Y gp (T;X)=V Sn jX-TG). (2.5) 

In fact, the famous power method [20] for the extreme eigenvalue problem of symmetric matrix is a special case 
of this gradient projection update scheme. Manton [30] considered another different projection scheme, 

Y mp (r;X)=V Sn jX + TD). (2.6) 

Absil et al. [3] proposed the polar decomposition 

Y pd (r; X) = (X + tD)(I p + t 2 D t D)-^ 2 . (2.7) 
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The polar decomposition is equivalent to Manton's projection update scheme, but has lower complexity cost. It 
is then mainly considered in this paper. It is worth noting that the QR factorization update scheme proposed 
in [3] can be regarded as an approximation to the polar decomposition update scheme. For any rank p matrix 
C £ R™ xp , assume that C = qr(C)£7 is the unique QR factorization with qr(C) being the orthogonal matrix 
and U being upper triangular with positive diagonal entries. The formula in [3] can be simplified as follows: 

Y qr (r;X) = qv(X+TD). (2.8) 

2.2 First-order optimality condition 

To begin with, we introduce some basic concepts related to the Stiefel manifold as in [15]. For any X £ 5 n ,p, 
define the tangent space at X as Tx '■= {A : X T A + A T X = 0}. Then the projection of any Z £ R nxp onto 
the tangent space 73c is Pj-(Z) = Z — Xsym(X T Z). 

There are two different metrics on S n ^ p . The first one is the Euclidean metric d e (A, A) = (A, A), which is 
independent of the point X. The second one is the canonical metric d c (A, A) = (A,PxA), which is related to 
X. The gradient VJ 7 £ Tx of a differential function T(X) : R" xp — > R on the Stiefel manifold is defined such 
that 

(G, A) = d c {VT, A) ee (V.F, P X A) (2.9) 
for all tangent vectors A at X. Solving (2.9) for VJ 7 such that X T VJ r is skew-symmetric yields 

VJ" = G — XG T X. 

Notice that VJ- is not the projection of G onto the tangent space at X. The latter should be G~ Xsym(X T G). 
We now give the first-order optimality condition without proof. It is analogous to Lemma 2.1 in [48]. 

Lemma 2.1 (First-order optimality condition). Suppose X is a local minimizer of problem (1.1). Then X 
satisfies the first- order optimality condition 

V X C(X, A) = G- XG T X = 0; 

i.e., VJ 7 = 0, with the associated Lagrange multiplier A = G T X . Besides, we have that 

G - X(2pG T X + (1 - 2p)X T G) = 0, for any p>0. 

3 Constraint Preserving update schemes 

For a feasible point X £ S n . p , denote TZ X = {XR : R £ W x p} and Af x = {W £ R" xp : X T W = 0} to be the 
range space of X and the null space of X T , respectively. It is well known that the two spaces are orthogonal 
to each other and their sum forms the whole space R nxp . As a result, any point in R raxp can uniquely be 
decomposed into the sum of two points, which belong to the two spaces, respectively. With this observation, 
we introduce our idea for a framework of constraint preserving update schemes for problem (1.1). 
Given a matrix W in the null space Afx , consider the following curve 

Y(t; X) = XR{t) + WN(t), (3.1) 

where R(t),N(t) £ W }Xp and r > is some parameter. In other words, this curve can be divided into two 
parts, namely XR(t) in the range space of X and WN(t) in the null space of X T . Our goal is to determine 
appropriate R(t) and N(t) such that the curve Y{t;X) is still feasible; i.e., Y T Y = I p . If we assume that 
R(t) = R(t)Z(t) and N(t) = N(t)Z(t), where Z(t) is always invertible, the curve (3.1) can be expressed by 

Y(t; X) = (XR(t) + WN(t))Z(t). (3.2) 

The condition that Y T Y = I p requires 

Z{t) t {R{t) t R{t) + N{t) t W t WN{t))Z{t) = I p ; 
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or, equivalently, 

ZiT^Zir)- 1 = R{t) t R{t) + N(t) t W t WN(t). (3.3) 

Once W, R(t) and N(t) have been determined in some way, Z(j) can be calculated by (3.3). Then R(t), N(t) 
and hence the whole curve Y(t; X) will be given. 

Now we focus on the choices of W, R(r) and N(t). As Y(t;X) with r > is a curve starting from the 
current iteration X, it is natural to impose that Y(0\ X) = X. To meet this condition, we notice the relation 
(3.2) and ask that R(0) = I p , iV(0) = and Z(0) = I p . With these choices, it follows by (3.2) that 

Y'(0; X) = X(R'(0) + Z'(0)) + WN'(0). (3.4) 

Assume that some matrix E is chosen, which is intended for —Y'(0;X); i.e., E = —Y'(0;X). Then (3.4) holds 
if 

W = -(I-XX T )E, R'(0) + Z'(0) = -X T E, N'(0)=I p . (3.5) 
To satisfy the conditions N(0) — and N'(0) = I p , we may simply choose 

N(t) = tI p . (3.6) 

By (3.3) and (3.6), we can get that 

Z'(0) T + Z'(0) = -{R'{0) T + R'(0)). 

This, with the second condition in (3.5), means that X T E must be a skew-symmetric matrix; i.e., E € 7x- In 
the following, we consider two approaches to choose R(t) and Z(t) such that (3.3) holds. 

Approach I. To meet (3.3) and R{0) = I p , we consider to choose R(t) = I p + tR'(0). Since N(t) = tI p , we 
can get Z(t) from (3.3) by the polar decomposition or the Cholesky factorization. 

If by the polar decomposition, Z(t) and Z'(t) are always symmetric. In this case, it is easy to show that 
Y(t; X) = V Sn p (X -tE + tX(R'(0) + X T E)), which with (3.5) further implies that 

Y(t- X) = V Sn jX - T E- tXZ'(0)), (3.7) 

where Z'(0) is any p-by-p symmetric matrix. If Z'(0) = 0, (3.7) reduces to the ordinary polar decomposition or 
Manton's projection update scheme. If Z'(0) = sym(X T G) and E = G — Xsym(X T G), it becomes the ordinary 
gradient projection update scheme. 

If by the Cholesky factorization, Z(t) and Z'(t) arc always upper triangular. Similarly, we can derive 

Y(t;X) =qr(X-TE-TXZ'(0)), (3.8) 

where Z'(0) is any p-by-p upper triangular matrix. When Z'(0) = 0, (3.8) reduces to the ordinary QR factor- 
ization update scheme. 

Approach II. In this approach, we regard R(t) is some function of Z(t). To solve (3.3) easily, we may assume 

R(t) = 2I p -Z{t)-\ (3.9) 
Substituting (3.6) and (3.9) into (3.3) leads to 

Z{t)- t + Z(t)- 1 = 2I p + —W T W. 
Consequently, Z(t) must be of the form 

Z{t) = (l p +^W T W + L{r)y\ (3.10) 

where L(t) is any p-by-p skew-symmetric matrix with L(0) = 0. Notice that the above inverse always exists 
since L(t) is skew-symmetric. The relations (3.9) and (3.10) indicate that R'(0) = Z'(0) = —2/(0). Further, 

by the second relation in (3.5), we must have that 2/(0) = -X T E. Thus we can choose 

L(t) - g(r)X T E, (3.11) 
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where g(r) is any function satisfying g(0) = and <?'(0) = — . 

To sum up, given any matrix E € 73c, we can define the following update scheme, 

W= -(I n -XX T )E, 

J{t)=I p + ^W t W + 9 {t)X t E, (3.12) 
Y(t; X) = (2X + tW)J(t)- 1 - X. 

For simplicity, we may choose g{r) = t/2 in (3.12). See §4.2 and §6.4 for other possible choices of g(r). If there 
is no confusion, we will write Y(r;X) as Y(t) and J(t) as J, etc. 

A few remarks on the framework and the direction E are made here. Firstly, it follows from (3.7) and (3.8) 
that the framework can unify the famous polar decomposition or QR factorization update scheme corresponding 
to Z'(0) = 0. Meanwhile, it can yield more general polar decomposition or QR factorization update schemes 
by choosing different Z'(0). We will mainly consider the new update scheme (3.12) in the remainder of this 
paper. Secondly, like unconstrained optimization, there are many possible choices for E so that for example the 
gradient descent, conjugate gradient, or quasi-Newton direction is used under the update scheme (3.12). We 
will focus the gradient descent direction in §3.1 due to its simplicity. 

3.1 Choice of E 

In this subsection, we consider to seek an appropriate E such that the update scheme Y(t;X) given by (3.12) 
defines a descent curve. 

Lemma 3.1. For any feasible point X € S n>p , consider the curve given by (3.12), where E = D p , 

D p = (I n -(l-2p)XX T )VT = G-X(2pG T X + {l-2p)X T G), p>0. (3.13) 
Then the following properties hold: 

(i) Y(t) t Y(t)=I p ; 

(ii) Y' (0) = —D p is a descent direction and 

dF(Y{r)) 
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<-mm{p,l}\\VF\\ 2 F ; 



T=0 



(Hi) for any r > and Q T Q = I p , Q £ W xp , we have YQ ^ X; 
(iv) cond(J) < 1 + ^\\W\\% + ^\\X T D p \\ F . 

In particular, if p = n, Y(t) = X(2J^ 1 — I n ); if p = 1, the matrices X, Y, G reduce to the vectors x, y, g, 
respectively, and the update scheme becomes 

( 2 + tx t g \ t 
v(t) = 5 1 la; ^ q. 

Vl + ^(5 T 5-(^ T 5) 2 ) J l+ T -(9 T 9-{x T 9) 2 ) 

Proof. The fact that X T D p is skew-symmetric implies that J is invertible. (i) follows from the construction of 
the update scheme (3.12). Meanwhile, we know that Y'(0) = -D p . Noting that X T VF = X T G - G T X, it is 
easy to verify that 

(G T X,X T VT) = -(X T VT,X T VT)/2. 
This, with the chain rule, G = V.F + XG T X and the definition of D p in (3.13), implies that 

T' r (Y(0)) = (G,Y'(0)> = -<V-F,(J„ + (p-l)XX T )VF) < -min{p, 1}||V^|||. (3.14) 

So (ii) is true. We prove (iii) by contradiction and assume that there exists a p-by-p orthogonal matrix Q such 
that Y = XQ. Then multiplying X T from both sides of Y = XQ and using (3.12) and X T W = 0, we get 
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that 2J- 1 - I = Q; i.e., 21 - J = QJ. It follows from (21 - J) 7 '(21 - J) = (QJ) T QJ and Q T Q = I p that 
J T + J = Alp, which is a contradiction to the definition of J. The contradiction shows the truth of (iii). 

For (iv), it is obvious that || J|| 2 < 1 + xll^Hl + §II- x ' T - d pIIf, where |j • \\ 2 is the spectral norm. By the 
definition of J and X T D p + DjX = 0, it is easy to verify that 

r 2 /r 2 1 \ T /r 2 1 \ 

J T J = I P + —W T W+ W T W + -tX t D p ) {— W T W + -tX t D p J, (3.15) 

which implies A m i n (</ T J) > 1. Thus it follows that 

cond(J) = Am . n ffi J)1/2 < II J\U < 1 + ^\\W\\% + \\\X T D p \\ F . 

In the case that p = 1 or n, it is not difficult to simplify the corresponding update schemes and we omit the 
details here. Thus, we complete the proof. □ 

Before we proceed, several remarks on Lemma 3.1 are in order. Firstly, there are two special choices of p. 
One is p = 1/2, yielding Y'(0) = -VJ 7 ; the other one is p = 1/4 yielding Y'(0) = -G + Xsym(X T G). Note 
that — V3F and — G + Xsym(X T G) are the projections of — G onto the tangent space at X under the canonical 
metric and Euclidean metric, respectively. Without otherwise specification, we will always choose E = D p in 
the remainder of this paper. In this case, by (3.12) and the definition of D p , we must have that 

W = -(I n - XX T )D P = -(!„ - XX T )G. (3.16) 

Secondly, recalling that the Grassmann manifold Q n .p is defined as the set of all p-dimcnsional subspaces of 
an n-dimcnsional space, any two orthogonal matrices whose columns span the same p-dimcnsional subspace 
can be regarded as the same point in G„ tP . By (iii) of Lemma 3.1, we know that Y(r) with r > and X 
must be different points in Q n ^ p . Thus our update scheme can be also used for optimization on the Grassmann 
manifold. Finally, statement (iv) of Lemma 3.1 indicates that the condition number of J can be controlled by 
the parameter r, as will be addressed in Theorem 4.7. 

In the above, we mention the normal choice E = D p . In practice, we may seek an appropriate low-rank 
matrix E for a trade-off between the low complexity and the good search curve. As we know, calculating the 
inverse of a p-order matrix needs 0(p 3 ) flops in general. So when p rj n, it is quite expensive to calculate 
J -1 directly. In this case, we may construct low-rank matrices E so that J -1 can be cheaply obtained. The 
following lemma shows the possibility of choosing a rank-2 matrix for E, with which J -1 can be analytically 
given and fast computed. Similarly, we may also form a rank-2r matrix E with 1 < r < n/2 in the same vein. 

Lemma 3.2. For any feasible X G S„ iP , define = G^ef — X^GT^X . Consider the curve Y(t) given by 
(3.12) with E = D^ q \ where q is the column index given by 

g:=argma^ li ... iP (G,DW>. (3.17) 

Then we have that 

J- 1 =I P - e q e T q + (1 + a)-\-X^ q G {q) + e q ){X 7 q G {q) + e q ) T , (3.18) 
where a — x^(g)(^P ~ ^(q)^(q))^'(q) ' Moreover, Y(t) is a descent curve satisfying 

2p 

Proof. With the definition of and X T X = I p , we have that 

W = -(I- XX T )D^ q) = -(I - XX T )G {q) e T q 

and 

X T Di«)=X T G {q) eT-e T q Gj q) X. 
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Hence, J can be rewritten as 

J = I p + ie q e T q +be T q -e T ql (3.19) 

where £ = jG^(/-1J t )G( 9 ) = a — b T _ q b- q and b = ^X T G^ q y Here b- q is the vector by replacing the q-ih 
elements with in b. The SMW formula gives 

(J p + £e g e^ + - e^)- 1 = I p - e q e T q + (1 + £ + t^L^^L, + e g )(6_ 9 + e ? ) T . (3.20) 

Thus (3.18) follows from (3.19), (3.20) and £ = a - b1 q b^ q . 

Moreover, notice that V.F = Sf=i • This, with the choice of q in (3.17), D 1 / 2 = VJ-" and (ii) of Lemma 
3.1, indicates that 

PK(Y(0)) = -p{G,DM) < —(G, VJ 7 ) < —WVnl, 
which completes the proof. □ 



For the linear eigenvalue problem or for problem (1.1) with p = 1, we further have that X T G = G T X. In 
this case, our new update scheme (3.12) can be viewed as a generalized gradient projection scheme. 

Lemma 3.3. Assume that X T G = G T X for any feasible point X 6 S n . p . The update scheme (3.12) with 
E = Dp can be expressed by 



Y(t) = V Sn , p IX [I + tX 1 G-—G 1 (I n - XX 1 )GJ - tG 

Proof. The condition that X T G = G T X implies that X T D p = 0. So by (3.12), J = I p + ^W T W. Rewrite the 
Y(t) in (3.12) as Y(t) = (X(2I - J) + tW) J' 1 . It follows from this and Y(t) t Y{t) = I p that 

(X(2I - J) + tW) T {X(2I - J)+ tW) = J 2 . 

Thus similarly to (2.4), we see that Y(t) is the solution to 

min \\Y(t)-(X(2I-J)+tW)\\ f , s.t. Y(t) t Y(t) = I p . 

r(r)GR" X! » 

Therefore we can have that 

y(r)=Vs n , p (X(2I-J) + tW) 



= ^5„, P I X [I + tX 1 G-—G 1 {I n - XX 1 )Gj - tG 
where the second equality uses W = —{I n — XX T )G from (3.16). This completes the proof. □ 
3.2 Comparison with the existing update schemes 

To begin with, we address a relationship between the update scheme (3.12) and the Wen- Yin update scheme. 
We show that the Wen- Yin update scheme can be regarded as a special member of the update scheme (3.12) 

■7" 

with g(r) = — under the condition that I p + jX T D is invertible. Notice that this invertible condition is 
indispensable for the well definition the Wen- Yin update scheme, but is not necessary for the validity of the 
update scheme (3.12). 



Lemma 3.4. Assume that T = 
invertible and 



T21 T22 



where T 22 and T n — 7i 2 r 22 1 T 21 are invertible. Then T is 



T -i 



(T 1 T 1 T 1 ^ l ,r F \ 1 (T T* r p—^- r p \ — l r n rp—1 

^12^22 1 2l) K 1 !! L Vl Lt 12 1 2l) ^12^22 

Ti — 1 ryi / rji rri rrt — lm \ — 1 rri — 1 ryi / rri rri rri — Lrji \ — 1 rri rri — 1 , rri — 1 

22 J 2l(.Jll — ^12^22 J 21j 1 22 1 21 K 1 11 — 1 12 J 22 1 21 ) 1 12 1 22 + 1 22 



s 



Proposition 3.5. For any feasible point X G S n , P , if the tangent direction D G Tx is such that 1^ 



t \X T D is 

T 

invertible, the update scheme (2.3) is well-defined and it is equivalent to the update scheme (3.12) with g(r) = — 
and E = D . 

Proof. Consider the update scheme (2.3) with U = [P X D, X] and V = [X, -P X D}. Using P x = T n - \XX T , 
X T X = I p and noticing that X T D is skew-symmetric, we can rewrite 



l2p 



-V 1 U = 



Ip + \X T D 



In 



£ T 

2 P 



For convenience, denote I 2p + ^V T U = 
X T X = l p , 9 {t) = T - and W = -P X D 



where Tjj G 



0X P 



Til T12 
T21 T22 

hXX T D, the J in (3.12) can be expressed by 



G {1,2}). Using the relations 



-X 1 D 



D T PlP x D. 



As I p + jX T D is invertible, we have that Tn - T 12 T 22 1 T 2 i = (l 
3.4 that I 2p + ^V T U is invertible and the update scheme (2.3) ii 

Further, by Lemma 3.4 and (3.21), we have that (I 2p + ^V T C/) _1 



\X T D) 



(3.21) 



J. Thus it follows from Lemma 



Mn M12 
M 2 i M 22 



where 



M 11 = J-\I P + $X 1 D), M 12 = -|J- 1 , 

A/2i = Wp- ( J p + jX T D)J-\l p + ^X T D)), M 22 = {I p + \X T D).r 



Direct calculations show that 



M 11 + hl 12 X T D = J-\ 



t M 2 i + -M 22 X* D\ =2Ip — 2[I p + -XD)J 



By the above relations, (2.3), X T X = I p and P X D = \XX T D - W, we can obtain 



Y wv (t) =X-t[P x D,X] 





A/12 " 






. M 2 i 


A/22 _ 







X 



tX[M 21 + -M 22 X 1 D 



-[-XX 1 D 



w 



-M 12 X' D 



= (2X + tW)J~ 1 -X, 
which completes the proof. 



□ 



For a fixed feasible point X G S„ tP , assume that the gradient G has been computed. We now compare the 
computational costs of several constraint preserving update schemes. To do so, we review the computational 



costs of some basic matrix operations. Given A G R" lX " 2 and B G 



12 , computing AB needs 2n 1 n 2 l 



flops while computing A T A only needs n x n 2 flops. Computing qi(A) by the modified Gram-Schmidt algorithm 
needs 2n 1 n\ flops. If B is symmetric, computing the eigenvalue decomposition B = PY,P T with orthogonal 



P G M™ 2X ™ 2 and diagonal E G R n2Xri2 by the symmetric QR algorithm needs 9n% flops. If B is skew-symmetric, 
computing the exponential of B needs lOn^ flops. If B is symmetric positive definite, computing B 1 ! 2 needs 
10?i.2 flops. In addition, if B is nonsingular, solving the matrix equation BT — A T by the Gauss elimination 
with partial pivoting needs 2riin| + 2ri2/3 flops. It follows that computing B^ 1 needs 8713/3 flops. We refer 
interested readers to [20] for more details. 

In the following computational cost analysis, we take the new update scheme (3.12) and the Wen- Yin update 
scheme as examples. Notice that the 0(p 2 ) term is omitted for 1 < p < n comparing with 0(np 2 ) or 0(n 3 ). 
Consider the case of 1 < p < n. To analyze the computational cost for (3.12), since E = D p and hence 
W 



-G + XX T G, we can rewrite the feasible curve as 
Y{t-X) = (x(-I p + X T G 



(rj(r)- 1 ) 



X, 
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Tabic 3.1: Computational cost for different update schemes 



different update schemes 


1 < p < n 


p = 


= 1 


P = 


- n 


first r 


new r 


first r 


new r 


first t 


new r 


gradient projection (2.5) 


"inp* + 2np + 32p 3 /3 


3np 2 + 2np + 32p 3 /3 


5n 


5n 


41n 3 /3 


41n 3 /3 


geodesic (2.1), D — D p 


lOnp 2 + 2np + 80p 3 


4np 2 + Tip + 80p 3 


7n 


3n 


14n 3 


12ti 3 


polar decomposition (2.7), D — D p 


7np 2 + 3np + 32p 3 /3 


2np 2 + 2np + 32p 3 /3 


7n 


3n 


53ti 3 /3 


38ti 3 /3 


QR factorization (2.8), D = D p 


Qnp 2 + 3np 


2np 2 + 2np 


7n 


3n 


6ti 3 


2ti 3 


Wen- Yin (2.3), D = D p 


9np 2 + 2np + 40p 3 /3 


Anp 2 + rip + 40p 3 /3 


7n 


3n 


67n 3 /3 


52ti 3 /3 


Wen- Yin (2.3), D - D 1/2 


7np 2 +np + 40p 3 /3 


4np 2 + np + 40p 3 /3 


7n 


3n 


61n 3 /3 


52 3 /3 


New (3.12), D = D p 


7np 2 -(- 2np + 5p 3 /3 


2np 2 + 3np + 2p 3 /3 


7n 


3ti 


14n 3 /3 


8n 3 /3 



where 

2 

J(r) = 7 P + ^-(G T G - G T 11 T G) + y (A T G - G T X). 

Forming J(r) needs 3np 2 + p 3 flops involving computing X T G, G T G and G T XX T G. The final assembly for 
K(r) consists of involving solving one matrix equation and performing one matrix multiplication and two matrix 
subtractions, and hence needs Anp 2 + 2np + 2p 3 /3 flops. Therefore, the total cost of forming Y(t; X) in (3.12) 
for any r is 7np 2 + 2np + 5p 3 /3. While doing backtracking line searches, we need to update Y(t;X) for a 
different r. Denote the first trial and the new trial stepsizes by r id and T new , respectively. It is easy to see that 

Y{r new ;X) = (x(—I p + X T G) - G+(— — )x) (r new J{T new )- 1 ) - X. 

V ^Told ' \T new Told' / 

As X^^-^Ip + X T G) — G is already computed in Y(r id; X), we store this matrix and hence only need 2np 2 + 
3np+ 2p 3 /3 to compute Y(T new ; X). 

To analyze the computational cost for the Wen- Yin update scheme (2.3), we see that it takes 3np 2 and 
2np 2 +np flops to form I2 P +^V T U and PxD p , respectively. The final assembly for Y{t) needs 4np 2 +np+40p 3 /3 
flops. Hence, the total cost for the Wen- Yin update scheme is 9np 2 + 2np + 40p 3 /3. When p = 1/2, we see 
that U = [G,X], V = [X, —G] in (2.3), which implies that there is no need to compute PxD\n any more and 
the total cost for forming the Wen- Yin update scheme can be reduced to 7np 2 + np + 40p 3 /3. The cost for 
updating Y(r) with a new r is Anp 2 + np + 40p 3 /3 since V T U can be stored and the main work is solving a 
matrix equation and the final assembly. In the case of p = 1 or n, the cost for the update schemes (3.12) and 
(2.3) can be easily obtained in a similar analysis. We omit the details here. 

In Table 3.1, we list the cost of the aforementioned update schemes without giving more details. In the 
table, the column "first r" gives the cost of computing a feasible point with the first trial stepsizc of r, whereas 
the column "new r" provides the cost of getting a new feasible point with a new stepsize of r. Tabic 3.1 tells 
us that the ordinary gradient projection actually has the lowest complexity cost while the scheme (2.8) based 
on the QR factorization and our update scheme (3.12) are strong candidates. Although by Proposition 3.5, 
our update scheme and the Wen- Yin update scheme are equivalent under some assumption, the former has a 
low complexity cost especially for large p, as shown in Table 3.1. As pointed in [4], however, the choice of the 
update scheme can affect the number of iterations required for solving the optimization problem. Hence, a lower 
complexity cost at each iteration does not imply a higher efficiency on the whole. Actually, seeking a constraint 
preserving update scheme which can find the global minimizer with high probability and at a fast speed is still 
an open problem. This remains under investigation. 

4 Adaptive feasible BB-like (AFBB) method and global convergence 

In this section, we focus on the adaptive feasible BB-like method and its global convergence. We also propose 
a strategy to control the feasibility error in practical computations. 
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4.1 Adaptive feasible BB-like method 

To provide an efficient scheme, we must also pay much attention on choosing the stepsize r in the constraint 
preserving update scheme (3.12) with E = D p . Although there is only one parameter, its choice proves very 
important to the efficiency of the scheme. Since BB-like methods need less storage and are very efficient for 
unconstrained optimization problems [10, 17, 39, 52] and special constrained optimization problems [6, 8, 9], we 
consider to use some BB-like stepsize in the scheme (3.12). 

Denote Sk-i = Xk - Xk-i and Yk-i = D p ,k - D Pi k-i, where Xk = Yk-\{rk-i\ Xk-i). Similar to the 
unconstrained case, we can get the large and short BB stepsizes as follows: 

„LBB _ (Sk-l,S k -l) T SBB _ \(Sk-l,Yk-l}\ 



In the numerical experiments in §6, we adopt the following alternative BB (ABB) stepsize [8] 

ABB _ / rl BB , for odd k; 

k \ t lbb , for even k. [ ' 

If J^\, Dp^-i and D p j~ is stored in updating Xk, we can get (Sk-i, Sfc-i) for free since by the feasibility of 
Xk and Xk-i, (Sk-i, Sk-i) = 4p — 4tr( J J Tj L 1 ). Thus, computing the ABB stepsize needs at most 6np flops. 

Although BB-like methods prove efficient for nonlinear optimization, its heavy nonmonotonicity in function 
values makes the global convergence analysis difficult. Up to now, the unmodified BB method is only showed 
to be globally convergent in the convex quadratic case [38] and the convergence is R- linear [11]. It is shown 
in [23] that the unmodified BB method needs not converge for some extreme eigenvalue problem, which is a 
special case of problem (1.1). In this paper, we consider to incorporate the ABB stepsize with the adaptive 
nonmonotone line search (see [8, 12]). 

In general, the Armijo-type line search condition can be expressed as 

H Y k{Tk)) <T r + &T k T' T {Y k {Q)), 

where T r is the so-called "reference function value" , satisfying T r > T k . Notice that if T r = Tk, the correspond- 
ing line search reduces to the classical Armijo line search, which is monotone; if T r = v&8XQ<i<mm{k,M-i} T{Xk-i), 
where M is some positive integer, it gives the nonmonotone line search proposed in [21, 39]. An adaptive strat- 
egy to update T r is introduced by Toint [47] for designing trust region methods and extended by Dai and Zhang 
[12] for designing line search methods. 

A basic feature of the adaptive strategy is that, no line searches will actually be done if the current best 
objective value is improved in at most every L iterations, where L is a preassigned positive integer. For 
convenience, denote by Tb es t the current best function value and by T c the maximum objective value since the 
value of J-best was found. Initially, we can set J> := +oo, J-best = J~ c = J-(Xq). The value T c is a candidate for 
updating J-best- This setting even allows Tk > J~o during early iterations. The following is a detailed description 
to update J-best and T c . 

if ^fc+l < Fbest, 

J~best = J~k+1, J~c = J~k+1, I = 0, 



else 



(4 2) 

T c = max{J' c ,Ji +1 },( = 1 + 1, 



if I = L, T r = T c , T c = Tk+i, I = 0, end 

end 

Now we describe the complete AFBB method. 
Algorithm 4.1. (An Adaptive Feasible BB-like Method) 
Step Give the starting point and initialize the parameters. 

(i) Given e, p > 0, < a, S < 1, e m i n , e m ax > 0, a sufficently large A > and set k := 0, I := 0; 

(ii) Pick up Xq and positive integer L, set T r := +oo, Tbest = T c := Tq; 
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(Hi) Compute D Pyk and Tq . 
Step 1 If\\D Ptk \\ F < e, stop. 

Step 2 Find the least nonnegative integer i k satisfying 

T{Y(5 lk T { k 1] ;X k )) < T r + Sa 1 "^ ^(Y k (0; X k )), (4.3) 

and set T k = <7 lfe Tju . 

Step 3 (i) Update the curlinear search direction: 

D P M = G k - X k (2 P GlX k + (1 - 2p)X k r G k ); 

(ii) Update the new point by update scheme (3.12) with E = D pk : 
X k+ i = Y k (T k ;X k ), T k+ i = T{X k+ i); 

(Hi) Update T r , T hest , T c by (4-2). 
Step 4 Calculate by a certain BB stepsize and set 

T ( k 1] = max {e min /\\D Pik \\ F , min { Tj [° \ mm{e max /|| J D p . fe || F , A}}}. (4.4) 
Step 5 k := k + 1. Go to Step 1. 

4.2 Convergence of the AFBB method 

We make the following assumption throughout this subsection. 

Assumption 4.2. J-(X) is differ entiable and its gradientT>J r (X) is Lipschitz continuous with Lipschitz constant 
L ; i.e., 

\\VT(X) - VT{Y)\\ F < L \\X - Y\\ F , for all X,Y £ S n>p . 

Now we introduce well approximation condition, which plays an important role in the convergence analysis. 

Definition 4.3. An update scheme Y(t) is said to satisfy the well approximation condition if there exist 
nonnegative constants L\ and Li such that for all k > and < t k < , 

\\Y k (t k ) - Y k (0)\\ F < LMDpAp, \\Yl(t k ) - Y k '(0)\\ F < L 2 t k \\D p , k \\ 2 F . (4.5) 

The following lemma indicates that the update scheme (3.12) with E = D Ptk satisfies the well approximation 
condition. 

Lemma 4.4. The update scheme (3.12) with E = D p ^ k satisfies the well approximation condition with L\ := 
1 + fmax/2 and L 2 := 2 + e max /2. 

Proof. Note that Y fe (0) = X k . It follows from (3.12) that Y k (t k ) - Y k (0) = -{t k D p<k + %X k ■ W^W k ) J k {t k )-\ 
Since X k X k = I p , we get that 

||n(*fc) - Y k (0)\\ F < t k \\ J k (t k )-%(\\D p , k \\ F + |||^ fe || 2 F ). (4.6) 

The relation A min ( J k (t k ) T J k (t k )) > 1 implies that || Jfe(t fc ) _1 ||2 < 1- Using this, t k < rj^' < e max /\\D Pik \\ F and 
||Wfc|| F < \\D p , k \\ F which follows from W k = -(!„ - X k X^)D Ptk in (4.6), we obtain 

\\Y k (t k ) Y k (0)\\ F < l±^ tk \\D P M\ F , (4.7) 
which implies that the first part of (4.5) holds. 
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Moreover, it follows from (3.12) that (X k + Y k (t k )) J k (t k ) = 2X k + t k W k . Differentiating its both sides yields 
Ifttfc) = ~\( x k + y k (t k ))(X^D Pik + hWfW^Jkitk)- 1 + W k J k {t k )' x 

= -D p , k J k {t k )- 1 - ^(Y k (t k ) - Y^X^D^Mtk)- 1 - U k {X k + Ykit^W^WMk)- 1 , 
where the second equality is from W k = —Dp.k + X k Xj, D pk . Noting that Y k (0) = —D Pik , we have 
-*k(0) = -D p>k {J k (t k )- 1 -I p )--t k {X k + Y k (t k ))W k T W k J k (t k )-^ 

(4.8) 

By the definition of J k (t k ), we know J^t^-Ip = - J k {t k )- 1 W%W k +%X%D p ^ . This, with \\J k (t k )-% < 
1, HWfcllir < \\D p , k \\ F , t k < e max /\\D p , k \\ F and X^X k = I p , gives 

II MU)- 1 - I p \\f < 2 + e ™ x t k \\D p , k \\ F . (4.9) 
In addition, it follows from X k + Y k (t k ) = (2X k + t k W k ) J fc (tfe) _1 that 

(X k + Y k (t k )) T (X k + Y k {t k )) = U k {t k )- T (l p + ^W^Wk) J k (t k )-\ (4.10) 

The relation (3.15) indicates that J k {t k ) T J k (t k ) y I p + '-fW k W k . By this and (4.10), we can get that \\X k + 
*fe(ifc)||2 < 2. This is because, if Si, S 2 G S n and SihS 2 >- 0, wc must have that A max (5f 1 S 2 ) < 1. Therefore, 
using the Cauchy inequality, (4.8), \\X k + Y k (t k )\\ 2 < 2, ||W fe || F < ||D„, fc ||j, (4.7) and (4.9) in (4.8), we can 
obtain 

\\Y k \t k ) -Y k ^ F <^±^t k \\D P M\ 2 F , 
which implies the truth of the second part in (4.5). The proof is complete. □ 
Since T>J-(X) is Lipstchitz continuous in the compact set S„ tP , there exists a constant C3 > such that 

WDF{X)\\ F < c 3 , for all X G S„, p . 
We now show that the stepsize r k is bounded below. 
Lemma 4.5. If r k = o- lk T^ satisfies the Armijo condition (4-3), then 

r k > min{ c,rP>, 
where c - Ml^minj^i} c , - 4+e m , 2+e max r 

Proof. Note that < ife < r^ 1 ' 1 . It follows from the chain rule that 

T' T (Y k (t k )) - J*(Y k (0)) = (VT(Y k (t k )),Y'(t k ) - 3^(0)) + {VT(Y k {t k )) - VF(Y k (0)),Y k \0)}. 
By ||2?.F(X)|| F < C3, the Cauchy inequality, the Lipschitz continuity of T>F(X) and Lemma 4.4, we have 

\K(Y k (t k )) - K(Y k (0))\ < c 3 \\Y k '(t k ) - Y k '(Q)\\F + L \\Y k (t k ) - y fc (0)|| F ||£» Plfc || F < C4i fc ||L> M || F . (4.11) 
By the Taylor expansion and (4.11), we can obtain 

HY k (t k )) - HY k (0)) = -t k (G k ,D p . k ) + f \T' T (Y k (t)) - J*(Y k (p))]dt 

Jo 

< -t k (G k ,D p . k ) + tc 4 \\D p , k \\ 2 F dt 
Jo 



MGk,D p , k ) + ^c 4 t 2 k \\D p , k \\ 2 F . 
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It follows from D pM = (I n - (1 - 2p)X k X k r )VT k that \\D p ^ k \\ F < max{2p, l}||V.Ffc||.F and hence 

T(Y k (t k )) - T(Y k (0)) < -t k (G k ,D Ptk ) + ic 4 4max{2 (0 ,l} 2 ||VJ' fe |||„ (4.12) 

It is possible that the first trial stepsize rj^ is or is not accepted by the Armijo condition (4.3). If not accepted, 
we must have r k o~ l < t£ and 

F(Y k {T k cj- 1 )) > T r + 5T k a- l F' T {Y k (0)) > F{Y k {<S)) - St^ 1 (G k , D P:k ) , (4.13) 

where the second inequality is due to ^(^(O)) < T r . Combining < a < 1, —(G k , D Ptk ) < — min{p, 1}|| VJ^H^, 
(4.12) corresponding to t k = T k a~ Y and (4.13), we obtain r k > c. Thus in either case, r k > min{c, Tju }. This 
completes the proof. □ 

In the following, we establish the global convergence result of Algorithm 4.1 in numerical sense. We remark 
that for a, b G K, the relation a < b in real computations means that a < b — e m ach, where e ma ch is the machine 
precision. 

Theorem 4.6. Let {X k : k > 0} be the sequence generated by Algorithm J^.l with e = 0. Then we have either 
D p k = for some finite k or 

liminf||D p>fc || F = 0. (4.14) 

A;— >-oo 

Proof. First, we denote F r ,F c ,Fbest,l at the fc-th iteration by F^,T^,F^ est ,l k , respectively. From Lemma 
4.5, we know that r k > min{c, t[ } for all fc, which means that a stepsize r k satisfying (4.3) can be found 
after finite number of trials. It follows from (4.3), the second statement of Lemma 3.1, r k > min{c, t^} and 
\\D Ptk \\ F < max{2p, l}||VJ" fe || F that 

Fk+i <?r - 6c 5 \\D Ptk \\ F , (4.15) 

w^re c 5 = ^gfeffl^ - 

Assume that D p k = will not happen after finite iterations. We consider two cases. The first case is that 
l k < L for all large k, which means that J r ^, st can be updated for infinite number of times. Thus, there exists 
an infinite subsequence {X ki : fcj > 0} such that 

F{X ki+1 ) < F(X ki ), k l+ i > ki, 

which means that F{X ki+i ) < J-(X ki ) — £ mac h in real computations. This contradicts that T is bounded below 
on the feasible set. So this case will not happen in real computations. 

The second case is that l k = L for infinite times. Define the infinite set JC := {ki : l ki = L}. Assuming that 
the relation (4.14) is false, there must exist cq > such that ||-D p ,fc||_F > c§ for all sufficiently large k. Then we 
can obtain by (4.15) that 

Tj < T kz - ei, for all ki < j < k i+1 , (4.16) 

where e\ = Sce """^'^{^p"?}^ 6 ' 6 "'"'^ ^ s a positive constant. Moreover, by the definition of J>, we know that 
J r r t+1 < max. ki< j< ki+1 Fj, which with (4.16) implies that for all ki G JC, 

F k ' +1 < F k ' - ei. 

Since 1C is an infinite set, the summation of the above relation over IC leads to a contradiction to the boundedness 
of T{X) in the feasible set. Thus (4.14) must be true. This completes the proof. □ 

Several remarks on the convergence are made here. Firstly, we can show the well approximation condition 
for the gradient projection and the polar decomposition update schemes as well. Thus similar convergence 
results can be established for the two schemes in the framework of Algorithm 4.1. Secondly, Theorem 4.6 still 
holds for nonlinear <?(t)'s provided that |<?(t)/t| is bounded above. 

As addressed in statement (iv) of Lemma 3.1, the condition number of J k can be controlled by the stepsize 
r k . Here, we conclude this subsection by giving the concrete upper bounds of the condition number of J k , which 
makes inverting J k stable in practical computations. 
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Theorem 4.7. Let {X k : k > 0} be the sequence generated by Algorithm 4-1 with e = 0. Then the condition 
number of J k is bounded, 

1 + EmaxA if < v k < 1; 
(5 + 4ax)/4, if v k > 1, 

where v k = T k \\D p ^ k \\ F . Moreover, we have either cond(Jfc) = 1 for some finite k or liminffc_ i . 00 cond(J/ £ ) = 1. 



cond(Jfe) < 



Proof. Define hp(t) = ^-t 2 + \\J\ — t 2 , where < t < 1. It is easy to verify that 

. ,^ f 0/2, if <[3< 1; 

max ha (t )= { ' „. .. ~ ~ (4.17) 

o<t<i pw \ (l + ,3 2 )/4, if p > 1. v ; 

It follows from -D M = W fc - X k X^D p>k and XjlT fc = that ||D ftfc ||| = ||W fe ||| + ||-XjZ>p,jklll- Noticing that 
Vk = T k \\D p ^ k \\ F < £max, by statement (iv) of Lemma 3.1, the definition of hp(t) and (4.17), we know that 



cond(Jfe) < 1 



\W k \\ 2 F , v k \\X^D Ptk \\ F 



< 



4 \\D Pik \\% 2 \\D p , k \\ F 

l+v k /2, if 0<v k < 1; 
(5 + ^)/4, if Ufc > l, 

1 + fmax/2, if < V k < 1; 
(5 + dax)/ 4 . i f W fc > 1- 



< \ J 2 ?u Z A ' (4.18) 



Moreover, we know from (4.4) that the sequence {r^ : fc > 0} is bounded above. With this, v k = T k \\D Ptk \\F 
and Theorem 4.6, we have either u& = for some finite k or liminffc_>. 00 v k = 0. Combining this with the second 
inequality in (4.18), we have either cond(Jfe) = 1 for some finite k or liminf / t- i . 00 cond(Jfc) = 1. This completes 
the proof. □ 



4.3 Controlling feasibility errors 

The update scheme (3.12) is so constructed that Y(t) t Y(t) = I p always holds. In practical computations, 
however, the orthogonality constraint may be violated after several iterations. As will be seen, it is mainly due 
to the numerical errors occured on the multiplication X T W, which should be exactly equal to zero in theory. 
In the following, we give a detailed analysis for this phenomenon and then propose a strategy for controlling 
the feasibility errors. 

Now we assume that all the arithmetics are exact, but the orthogonal constraint is violated at X; i.e., 
X T X I p . Denote E. x = X T X — I p , Sy = Y T Y — I p . It is easy to verify from the definitions of J that 

(2J- 1 - I p ) T (2J- 1 - I p ) + t 2 J- t W t WJ- 1 = I p , (4.19) 

which implies that 

II2J- 1 -Jp||a < 1. (4.20) 

Rewrite Y(t) in update scheme (3.12) as Y(t) = X(2J~ 1 —I p )+tWJ~ 1 . This, with (4.19) and X T X = E x +I p , 
implies that 

E Y = (2J- 1 - I P ) T E X (2J- 1 - Ip) + t(2J- 1 - I p ) T X T WJ- 1 + tJ~ t W t X{2J~ l - I p ). (4.21) 

It follows from W = -(!„ - XX T )D p in (3.12) and the definition of E x that X T W = E x X T D p . Then by the 
Cauchy inequality, (4.21), \\2J- 1 - I p \\ 2 < 1, || J~ l \\ 2 < 1 and r||L> p || F < e max , we know that 

\\S y \\f < (l + 2T\\X T Dp\\ F )\\E x \\ F < (l + 2(l + A max (S x ))e m ax)||Sx||F, (4.22) 

which shows that the feasibility of Y may be out of control after some iterations. 

To control the feasibility errors, we propose an approach, which is to replace the matrix W in the scheme 
with 

W = -(In - X(X T X)- 1 X T )G = -{I n - X(X T X)- 1 X T )D p . (4.23) 
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Figure 4.1: The error of feasibility during iterations 

Notice that when X T X is exactly 7 p , W reduces to — (I n — XX T )D p which is identical to the original W in 
(3.12) corresponding to E — D p . Denote by Y the new Y computed by using W and define Sy = Y T Y — I. 
Noting that X T W = 0, similar to the deriving of (4.21), we obtain that 

Z Y = (2J- 1 - I) T E X (2J- 1 - I), 

which with (4.20) implies that ||Sy||f < ||Sx||f- Thus, with np 2 + 8p 3 /3 extra flops for computing the W in 
(4.23), we can efficiently control the cumulative feasibility errors. Considering that X T W is not exactly zero in 
practical computations, we may reasonably assume that \\X T — X T X(X T X)~~ 1 X T \\p is the same order of the 
machine precision £ mac h- Then we have ||X t M^||_f < 0(e m acii)||£>p||F- Following the analysis of deriving (4.22), 
we have that 

\\%'\\f < \\^x\\f + 2TO{e mach )\\D p \\ F , 

which significantly improves the bound in (4.22). 

To illustrate the usefulness of the above strategy, we takes a typical example, which is to calculate the sum of 
the four largest eigenvalue of matrix "S3DKT3M2" from UF Sparse Matrix Collection [14]. We tried Algorithm 
4.1 with parameters chosen in §6.1 by using the original matrix W and the W in (4.23), respectively. Figure 
4.3 depicts the error of the feasibility versus the iteration number. From this figure, we see that the feasibility 
errors during the iterations arc efficiently controlled by using W indeed. 

5 Several extensions 

In this section, we mention some extensions of the constraint preserving update scheme (3.12) and the AFBB 
method. At first, we consider the following optimization problem with the general orthogonality constraint, 

min F(X), s.t. X*HX = K, (5.1) 

where H £ W nxn is a symmetric positive semidefinite (not necessarily symmetric positive definite) matrix and 
K £ W xp is a symmetric positive definite matrix. The first-order optimality condition of problem (5.1) is as 
follows. 

Lemma 5.1. Suppose X is a local minimizer of problem (5.1). Then X satisfies the first-order optimality 
condition V T := G - HXG*XI<- 1 = 0. 

Similar to the analysis in §3, we can give the following feasible update scheme for (5.1). 

Lemma 5.2. For any feasible point X with X*HX = K, denote E = GX*H 2 X - HXG*HX, W = -(!„ - 
XK- 1 X*H)E and J(t) = K + ^W*HW + \tX*HE. Consider the curve given by 

Y{t) = (2X + tW)J" 1 K - X. 

Then we have that 
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(i) Y{t)*HY{t)=K; 

(ii) Y'(0) — —E is a descent direction and J-' T (Y "(0)) < — %fwp || V.F|| f; 
(tiij cond(J) < cond(A) + iX J* (K) \\W*HW\\ F + 2X J AK] \\X*HE\\ F . 

Proof. Since X*HE is skew-Hermitian, J is invcrtible. (i) is obvious. Moreover, it is not difficulty to verify 
that Y'(0) = —E. This, with the chain rule and the definition of E, indicates that 

T' T (Y(0)) = (G,Y'(0)) = -(GX*H,GX*H - HXG*) = --\\GX*H - HXG*\\%. (5.2) 

Using the Cauchy inequality and VJ 7 = (GX*H — HXG*)XK which is from the definition of VJ 7 , we can 
obtain 

W^Hf < } X Y T ,A \GX*H - HXG*\\ F . 

Amin (A ) 

This, with the relation (5.2) implies that 

W(0))<-^g^l|v^|,. 

So (ii) is true. 

For (iii), it is obvious that ||J|| 2 < \\K\\ 2 + ^-\\W* HW\\ F + ^\\X* HE\\ F . Denote J= K'^JR- 1 / 2 . Using 
the definition of J and noticing that X* HE is skew-Hermitian, it is easy to verify that 

J*J = I p + —K- 1/2 W*HWR- 1/2 + K- 1/2 (—W*HW + -tX*HE) K' 1 (-W* HW + -tX* HE) K~ 1/2 , 



which implies that A m ; n (J*J) > 1. This further indicates that A m ; n (J* J) > X m [ n (K) 2 . It is because, if Si,S 2 are 
positive semidefinite Hcrmitian matrices, we must have that AminCSiS^) — ^mm(Si)X ma x(S2)- Thus it follows 
that 2 

cond(J) = - Ywnxf2 ^ cond W + a \ T (W , W*HW\\ F + 9X T (fr J X*HE\\ F . 

Amin(J J) > 4A m in(A) 2A min (A ) 

Hence we complete the proof. □ 

With the help of the above lemma, we can design the AFBB method for problem (5.1) similar to Algorithm 
4.1. Global convergence results can also be established. 

We can also extend our scheme for optimization problems with multi-orthogonality-constraints: 

min F{X X , ■ ■ ■ ,X„), s.t. X{H 1 X 1 = K u ■ ■ ■ ,X*H q X q = A' , 

where Hi G R™ 1 xni , • ■ • , H q G W 1 " xrlq are symmetric positive semidefinite matrices, and K\ G W 1 xpi , ■ ■ ■ , K q G 
l p ' xp « are symmetric positive definite matrices. Since the variables A 1; • • • , X p are separated in the constraints, 
the extension is natural. 



6 Numerical Results 

In this section, we present numerical results on a variety of problems to illustrate the efficiency of our AFBB 
method. We implemented AFBB in Matlab R2012a. All experiments were performed in Matlab under a Linux 
operating system on a Thinkpad T420 Laptop with an Intel® dual core CPU at 2.60GHz x 2 and 4GB of RAM. 
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6.1 Stopping criteria 



II Y X II f I 

Define tolfc := " k ^' lllF and tol£ := 1 ■ Wc terminate the algorithm if one of the following holds: (i) 

k > Maxlter; (ii) ||-D Pjfe || F < e||D Pj o||F; (in) tol£ < e x and tol{ < e f ; (iv) mean([tolfe_ min{fe T}+1 , • • • ,tol£]) < 
lOe^ and mean([tol^_ min |- fc T } +1 , • ■ • , tof^]) < 10e/, where Maxlter is the maximal iteration number. These 
criteria are the same as those used in [48] except that we replace ||VJ-fc||.F < e there by criterion (ii) here. 
Unless otherwise specified, we set e = 1CT 5 , e x = 10~ 5 , e f = 10~ 8 , T = 5, Maxlter = 3000. 
The other parameters for AFBB are given as follows: 

p = 0.5, a = 0.5, 6 = 0.001, e mi „ = lO" 20 , e max = 10 8 , A = 10 20 , L = 3. 

The initial trial stepsize Tq 1 -* at the first iteration is chosen to be 0.5 H-D^oIIf 1 - The ABB stepsize (4.1) is used 
for computing the trial stepsize . The "tic-toe" commands in Matlab is used to obtain the CPU time in 
seconds elapsed by each code. 

6.2 Nearest low-rank correlation matrix problem 

Given C £ S n and a nonnegative weight matrix H £ S n , the nearest low-rank correlation matrix problem is 
given by 

min -\\H {X - C)f F , s.t. diag(X) = e, rank(X) < r, X h 0, (6.1) 

where is the Hadama product operator of two matrices, e G R n is the vector with all ones, r < n is a given 
positive integer number. A usual weight is H = 1, where 1 £ M. nxn represents the matrix with all ones. 

To deal with the nonconvex rank constraints rank(AT) < r, as used in the geometric optimization methods 
[22], majorization method [34], and trigonometric parametrization methods [40], we rewrite X = V T V with 
V = [Vi, • • • , V n ] £ M. rxn . Consequently, we get the equivalent formulation of (6.1) as follows: 

min 9(V;H,C):=h\HQ(V T V-C)\\ 2 F: s.t. ||^|| 2 = 1, i = 1, • • • , n, (6.2) 

which is the minimization of a quartic polynomials over spheres. Among many approaches to solve problem 
(6.1), we compared our algorithm with several state-of-the-art methods; i.e., the majorized penalty approach 
(PenCorr*) [19], the sequential semismooth Newton method (SemiNewton) [28] , and the Wen- Yin nonmonotone 
BB method (OptM^) proposed in [48]. For more comprehensive literature reviews, see [19, 28]. 

We chose the initial point of problem (6.2) in the same way as in the majorization method. Specifically, 
we selected the modified PC A [18] of C; i.e., C pca , to be the initial point in AFBB or OptM. Let C have 
the eigenvalue decomposition C = P T AP, where P T P = I n , A ~ Diag(Ai,--- ,A„) with Ai > •■■ > A„. 
Define A r = Diag(Ai, • • • , A r ) and denote by Pi the first r columns of P. Then the i-th column of C pca is 

1 /2 

[C pca }(i) = Zi/\\zi\\, where Zi = Pi[A r i = 1, ■ ■ ■ , r. In case of Ai < for some 1 < i < r, we first solved the 
following problem by the semismooth Newton method [35] to obtain a symmetric positive definite matrix C , 

C = argmin xg<S n-||X-C|||, s.t. diag(A) = e, X y 0. 

Then we set the initial point as C pca . 

We list the test problems from [19, 24, 27, 28, 34] as follows: 

Ex.1 : The matrix C is the 387 x 387 one-day correlation matrix (as of Oct. 10, 2008) from the lagged 
datasets of RiskMetrics.-'' 

Ex.2 1341 : H = l,n = 500, the entries 

dj = exp ( - 7i i - j\ — — — 

V max{i,j}T3 - 74 |V«- yfiy 

*It can be downloaded from http : //www. math. nus . edu. sg/~matsundf/#Codes. 
tit can be downloaded from http://optman.blogs.rice.edu/. 
*Dr. Qin gna Li provided us this matrix kindly. 
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for i,j = 1,- • • ,n with 7i = 0,72 = 0.480, 73 = 1.511, 74 = 0.186. 

Ex.3 [191 : n = 500, the entries Cy = 0.5 + 0.5e" 05 l 4 ^l for i, j = 1, • • • ,n. The weight matrix H is either 1 

or a rand matrix whose entries are uniformly distributed in [0.1, 10] except for 200 entries in [0.01, 100]. 

[191 

Ex.4 : n = 943, C is based on 100, 000 ratings for 1682 movies by 943 users from Movielens data sets. 
It can be download from http://www.grouplens.org/node/73. The weight matrix H is either 1 or the one 

provided by T. Fushiki at Institute of Statistical Mathematics in Japan. 
[24, 27] <> 
Ex.5 -9 : We consider the five gene correlation matrices C: Lymph, ER, Arabidopsis, Leukemia, Hered- 
itary be. For the sake of comparison, as done in [24], we perturb C to 

C=(1- 7 )C + 7 F, 

where 7 £ (0,1) and F is a random symmetric matrix with entries uniformly distributed in [—1,1]. The 
corresponding weight matrix H is either 1 or the one created by Example 2 in [24]. 

In our implementation, all the parameters of each solver were set to be their default values except the 
termination criteria of OptM were changed to the ones in §6.1. The numerical results are reported in Tables 
6.1 and 6.2. In the two tables, "residual", "time" and "feasi" denote the residual \\H (X — C)\\p, CPU time 
and the violation of diag(AT) = e, respectively, and "nfge" stands for the total number of function and gradient 
evaluations. Note that CPU time of OptM and AFBB includes the CPU time for generating the initial point 
by the modified PCA. From the two tables, we know that AFBB performs better than OptM in terms of the 
solution equality, the CPU time and the number of function and gradient evaluations. Again, in the case when 
H = 1, AFBB not only runs considerably faster than SemiNewton and PenCorr, but also always find a better 
solution in terms of the residual except for the problem with large r; in the case when H 7^ 1, our AFBB shows 
great advantage over PenCorr in terms of the solution quality and CPU time. Apart from the ABB stepsizc, 
another reason for the efficiency of AFBB is its low complexity cost per iteration. The dominated cost of AFBB 
at each iteration is computing the function value and the gradient in (6.1). For special H = 1 and general H, 
the costs are 2n 2 r + 3nr 2 and 3n 2 r, respectively. In contrast, SemiNewton and PenCorr need to solve iteratively 
a series of least squares correlation matrix problems without the rank constraint, whose cost is very expensive. 

Furthermore, we consider the nearest low-rank correlation problem with extra requirements on its compo- 
nents as follows: 

mm XeS n ±\\H Q (X - C)\\% 

s.t. diag(AT) = e, rank(AT) < r, X±0, (6.3) 

Xij = Qij, 6 &e, 

where B e is the subset of , jr ) 1 1 < j < i < n} satisfying —1 < gy < 1 for any (i,j) £ B e . The requirements in 
problem (6.3) are imposed in some real-world applications from the financial industry. 

Using the same decomposition X = V T V, we obtain the equivalent formulation of (6.3) as follows: 

min VgM rx« 9(V;H,C) 

s.t. ||Vi||2 = 1, i = l,-" (6-4) 

y.'Vj 'in (i,i)eB e . 

However, our AFBB method can not preserve the constraints V^Vj — = 0, £ B e in (6.4). We then 

consider to use the standard augmented Lagrangian framework [32, 45] to handle this issue. Let \i > be the 
penalty parameter, A £ S n be the Lagrangian matrices associated with the constraints V^Vj—qij = 0, (i, j) £ B e 
in (6.4), respectively. Here, 

A = f Ay, or £ B e ; 
13 \ 0, otherwise. 

The augmented Lagrangian function is 

L fl (V,A):=6(V;H,C)+ ]T ^(Vf^ " Ay), (6-5) 

(i,j)eB e 

where ip^t, A) = \[it 2 — Xt. We can rewrite (6.5) as the following matrix form: 

All! 



Lp(V, A, A, A) = 6(V; H, C) + "-6(V; H e ,Q + A//x) - 



4// 
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Tabic 6.1: Numerical results of the nearest low-rank correlation matrix problem with H = 1 





SemiNcwton 


PenCorr 


AFBB 


OptM 


r 


residual time fcasi 


residual time fcasi 


residual time fcasi nfgc 


residual time feasi nfgc 


Ex.1, n = 387 


2 


1.629798c+02 3.57 3.8e-15 


1.622941c+02 16.78 9.9c-09 


1.623528c+02 0.11 3.0c-15 32 


1.623528c+02 0.10 3.0c-15 31 


5 


6.157168c+01 2.90 4.1c-15 


6.111010c+01 11.33 5.2c-08 


6.108805c+01 0.15 3.1c-15 80 


6.108805c+01 0.15 3.4c-15 90 


20 


6.087359c+00 2.05 4.8e-15 


6.066337c+00 3.45 6.3c-07 


6.063175c+00 0.26 3.3c-15 125 


6.063178c+00 0.29 3.4c-15 145 


40 


7.764528e-01 1.81 6.7o-15 


7.767936c-01 1.88 I.60-O8 


7.759036c-01 0.44 3.3c-15 151 


7.755939c-01 0.48 3.7c-15 166 


80 


3.227824c-02 1.52 9.0c-15 


3.261675c-02 1.33 1.4c-08 


3.265976c-02 0.46 3.3c-15 85 


3.301996c-02 0.44 3.4c-15 79 


Ex.2, n - 500 


2 


1.446640c+02 3.03 4.1c-15 


1.445524c+02 10.02 2.2c-07 


1.445515c+02 0.14 3.4c-15 15 


1.445515c+02 0.13 3.5c-15 14 


5 


4.116659c+01 2.67 4.1e-15 


4.112763e+01 8.75 3.3c-08 


4.112618c+01 0.17 3.3c-15 39 


4.112618c+01 0.16 3.8c-15 36 


20 


5.282393c+00 2.85 5.7e-15 


5.282298c+00 3.43 5.7o-08 


5.279807c+00 0.41 3.3c-15 132 


5.279816c+00 0.40 3.3c-15 127 


50 


1.342180c+00 2.28 8.1c-15 


1.343583c+00 1.74 6.4o-07 


1.340492c+00 1.21 3.7c-15 259 


1.340435c+00 1.46 3.5c-15 311 


100 


4.639703c-01 2.46 Lie- 14 


4. 645138o-01 1.81 1.2c-07 


4.640385c-01 1.62 3.5c-15 179 


4.640598c-01 2.29 3.7c-15 257 


125 


3.274712c-01 2.73 1.2e-14 


3.277983c-01 1.96 3.8c-08 


3.275699c-01 2.68 3.8c-15 239 


3.275667c-01 2.65 3.8c-15 234 


Ex.3, n - 500 


2 


1.579984e+02 2.83 4.1c-15 


1.564172c+02 22.39 4.3c-09 


1.563924c+02 0.16 3.4c-15 33 


1.563924c+02 0.15 3.5c-15 32 


5 


7.889257c+01 2.85 4.4e-15 


7.883423c+01 6.43 3.1c-08 


7.882875c+01 0.18 3.2c-15 49 


7.882875c+01 0.19 3.5c-15 56 


20 


1.570809c+01 2.32 5.3c-15 


1. 570796o+01 3.53 7.8c-08 


1.570689c+01 0.24 3.6c-15 56 


1.570688c+01 0.27 3.3c-15 67 


50 


4.139394e+00 2.82 6.5o-15 


4. 139455o+00 1.62 5.2o-07 


4.139192c+00 0.61 3.7c-15 114 


4.139207c+00 0.54 3.6c-15 94 


100 


1.466338c+00 2.19 8.8e-15 


1.466498c+00 1.69 2.4c-07 


1.466307c+00 1.05 3.8c-15 111 


1.466326c+00 0.93 3.6c-15 96 


125 


1.047950c+00 2.47 8.8e-15 


1.048114c+00 1.72 3.0c-08 


1.047966c+00 1.26 3.8c-15 107 


1.047948c+00 1.58 3.8c-15 135 


Ex.4, n = 943 


5 


4.135811e+02 25.49 6.2e-15 


4.128428c+02 106.65 6.5c-08 


4.127542c+02 0.91 5.1c-15 100 


4.127542c+02 0.98 5.2c-15 120 


20 


2.887727e+02 29.97 7.3e-15 


2. 887228o+02 54.68 3.2c-08 


2.886782c+02 2.40 6.1c-15 321 


2.886782c+02 2.49 6.0c-15 340 


50 


2.772868e+02 33.98 9.7c-15 


2.762742c+02 31.96 7.5c-08 


2.762732c+02 2.97 7.0c-15 209 


2.762733c+02 3.50 6.7c-15 252 


100 


2.760557c+02 34.99 1.2e-14 


2.757853c+02 5.88 1.7c-08 


2.757854c+02 2.99 6.7c-15 105 


2.757854c+02 3.60 6.7c-15 125 


150 


2.757926c+02 22.13 1.3c-14 


2.757853c+02 5.87 1.7c-08 


2.757854c+02 4.52 6.8c-15 107 


2.757854c+02 4.33 6.9c-15 100 


200 


2.757854c+02 27.22 1.2o-14 


2. 757853o+02 5.87 1.7c-08 


2.757854c+02 5.23 6.9c-15 93 


2.757854c+02 6.13 7.0c-15 109 


250 


2.757853c+02 37.31 1.3e-14 


2. 757853o+02 5.89 1.7c-08 


2.757854c+02 6.00 6.9c-15 83 


2.757855c+02 7.85 6.6c-15 109 


Ex.5, Lymph, n = 587 


2 


3.428835c+02 9.97 4.9e-15 


3.413643c+02 47.84 3.3c-09 


3.412429c+02 0.30 3.7c-15 95 


3.412429c+02 0.31 3.6c-15 116 


5 


1.802509c+02 8.04 4.6c-15 


1.797921c+02 40.68 8.I0-IO 


1.797057c+02 0.41 3.8c-15 136 


1.797057c+02 0.41 4.1c-15 142 


20 


6. 041712o+01 5.58 5.4c-15 


6. 038545o+01 17.02 5.1c-08 


6.037686c+01 0.38 4.5c-15 71 


6.037687c+01 0.41 4.7c-15 85 


50 


2.620417c+01 6.53 8.0c-15 


2.619999c+01 10.30 5.3c-07 


2.619778c+01 0.74 5.0c-15 105 


2.619780c+01 0.70 5.2c-15 96 


100 


1.474444c+01 4.78 1.0e-14 


1.474421c+01 7.14 7.0c-08 


1.474226c+01 1.00 5.6c-15 79 


1.474227c+01 1.10 5.4c-15 87 


150 


1. 219868o+01 4.57 1.2e-14 


1. 219858o+01 3.91 6.5o-09 


1.219828c+01 2.54 5.4c-15 135 


1.219842c+01 2.10 5.4c-15 107 


200 


1.156813c+01 6.45 1.6o-14 


1. 155880o+01 3.75 2.4o-08 


1.155904c+01 3.25 5.7c-15 117 


1.155919c+01 3.05 5.5c-15 109 


Ex.6, ER, n = 692 


2 


4.131904c+02 10.83 4.9c-15 


4.121766c+02 63.41 5.9c-10 


4.120227c+02 0.37 4.0c-15 80 


4.120227c+02 0.40 4.2c-15 105 


5 


2.110988c+02 11.78 5.1o-15 


2.099166c+02 120.52 7.3o-09 


2.097398c+02 0.61 4.5c-15 166 


2.097398c+02 0.61 4.6c-15 172 


20 


6.413592c+01 7.37 6.2o-15 


6.405544c+01 32.62 4.9o-08 


6.403579c+01 0.62 5.0c-15 105 


6.403580c+01 0.56 4.9c-15 89 


50 


2.845348c+01 8.89 8.2c-15 


2.844170c+01 18.12 3.1c-08 


2.843556c+01 1.31 5.7c-15 155 


2.843565c+01 1.41 5.5c-15 166 


100 


1.763079c+01 9.85 1.2c-14 


1.762974c+01 12.69 1.2c-08 


1.762548c+01 1.66 6.2c-15 93 


1.762556c+01 1.90 5.6c-15 106 


150 


1.516733c+01 6.21 1.4c-14 


1.516657o+01 7.27 1.7o-07 


1.516657c+01 3.00 5.5c-15 116 


1.516657c+01 3.77 5.7c-15 146 


Ex.7, Arabidopsis, n — 834 


5 


2.142944c+02 19.44 5.7c-15 


2.132022c+02 80.59 4.1c-09 


2.131530c+02 0.67 4.8c-15 92 


2.131538c+02 0.65 4.9c-15 89 


20 


5.877539c+01 16.30 7.1c-15 


5.858450c+01 64.79 3.4c-09 


5.855612c+01 1.16 5.1c-15 162 


5.855618c+01 1.14 5.3c-15 158 


50 


2.874451c+01 14.95 1.0e-14 


2.873997c+01 21.33 9.0c-09 


2.873723c+01 1.31 5.6c-15 97 


2.873734c+01 1.45 5.7c-15 110 


100 


2.100297o+01 15.99 1.2e-14 


2.100177c+01 20.48 2.0c-08 


2.099678c+01 2.35 5.8c-15 101 


2.099685c+01 2.97 5.9c-15 127 


150 


1.937204o+01 10.46 1.6c-14 


1.937153c+01 11.29 7.3c-08 


1.937159c+01 5.52 5.8c-15 164 


1.937154c+01 6.02 5.7c-15 179 


200 


1.898648c+01 15.17 1.8e-14 


1.894103c+01 9.50 1.9c-07 


1.894155c+01 7.49 6.0c-15 166 


1.894175c+01 7.28 6.0c-15 159 


Ex.8, Leukemia, n — 1255 


5 


3.317147c+02 62.86 6.6c-15 


3.309391c+02 198.33 2.0c-08 


3.308870c+02 1.87 5.7c-15 97 


3.308870c+02 1.87 5.6c-15 101 


20 


1.055349c+02 45.28 8.0c-15 


1. 054780o+02 126.67 1.6c-07 


1.054628c+02 2.48 6.7c-15 137 


1.054630c+02 2.75 6.6c-15 164 


50 


4.473242c+01 42.16 l.lc-14 


4.473089c+01 56.09 4.3c-07 


4.472953c+01 2.44 6.8c-15 69 


4.472947c+01 2.88 7.0c-15 91 


100 


3.273681c+01 30.11 1.5c-14 


3.273654c+01 49.53 3.8c-08 


3.273480c+01 5.66 7.0c-15 119 


3.273444c+01 5.67 6.9c-15 118 


200 


3. 098825o+01 48.54 2.2e-14 


3.094967c+01 26.69 6.9c-08 


3.095051c+01 8.82 7.0c-15 99 


3.095015c+01 12.30 7.4c-15 143 


250 


3. 090745o+01 50.49 2.6o-14 


3.080760c+01 27.41 8.0c-08 


3.080808c+01 11.48 7.0c-15 102 


3.080805c+01 13.26 7.1c-15 119 


300 


3.079624c+01 41.55 2.8c-14 


3.077701c+01 27.73 2.4c-08 


3.077741c+01 15.57 7.2c-15 117 


3.077732c+01 24.63 7.3c-15 190 


400 


3.077548c+01 42.09 3.0c-14 


3.077540c+01 9.80 1.8c-07 


3.077655c+01 37.07 7.5c-15 207 


3.077580c+01 33.91 7.7c-15 188 


Ex.9, Hereditary be, n = 1869 


5 


4.360733c+02 164.03 8.4o-15 


4.357125c+02 488.10 6.2c-08 


4.356901c+02 4.53 7.0c-15 63 


4.356901c+02 4.39 7.1c-15 56 


20 


6.425556c+01 146.43 1.0c-14 


6.425347c+01 249.65 8.6c-08 


6.425244c+01 4.77 7.8c-15 65 


6.425246c+01 4.71 7.5c-15 64 


50 


5.199661c+01 111.63 1.6c-14 


5.199414c+01 183.13 7.9c-08 


5.199149c+01 9.64 7.8c-15 148 


5.199230c+01 9.04 7.9c-15 138 


100 


5.026473c+01 130.12 2.1c-14 


5.026380c+01 179.67 2.5o-08 


5.026435c+01 11.06 7.6c-15 103 


5.026416c+01 11.74 7.7c-15 113 


200 


4. 989108o+01 155.94 3.0o-14 


4.969824c+01 81.58 5.8c-07 


4.969883c+01 17.91 7.6c-15 99 


4.969873c+01 20.05 7.7c-15 114 


250 


4.984306c+01 158.41 3.1c-14 


4.966047c+01 83.73 9.3c-10 


4.966067c+01 24.46 7.7c-15 113 


4.966072c+01 29.67 7.9c-15 141 


300 


4.967235c+01 124.68 3.3c-14 


4.965503c+01 32.78 4.4c-09 


4.965534c+01 33.17 7.8c-15 133 


4.965555c+01 38.05 7.7e-15 154 


400 


4. 965503o+01 38.12 1.4c-09 


4. 965503o+01 32.86 4.4o-09 


4.965655c+01 69.25 7.5c-15 211 


4.965616c+01 82.33 7.8c-15 250 
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Table 6.2: Numerical results of the nearest low-rank correlation matrix problem with H ^ 1 





PenCorr 


AFBB 


OptM 


r 


residual time feasi 


residual time feasi nfge 


residual time feasi nfge 


Ex.3, n — 500, random H 


2 


9.121537c+02 79.34 2.9c-09 


9.119687c+02 0.51 3.7c-15 48 


9.119687c+02 0.48 3.5c-15 50 


5 


4.541414c+02 49.04 7.5c-09 


4.539433c+02 0.70 3.1c-15 85 


4.539437c+02 1.40 3.0c-15 201 


20 


8.812253c+01 34.96 2.6e-07 


8.809186c+01 1.21 3.5e-15 156 


8.809152c+01 1.41 3.7e-15 185 


50 


2.194686c+01 42.49 9.3e-07 


2.189083c+01 6.28 3.6e-15 731 


2.188882e+01 6.80 3.6e-15 787 


100 


6.444922c+00 50.29 2.6c-07 


6. 288869o+00 36.62 4.2c-15 3011 


6.295220c+00 37.29 4.1c-15 3116 


125 


4.056629c+00 73.30 9.3c-07 


3.938597c+00 40.11 4.1c-15 3008 


3.958023c+00 41.18 4.3c-15 2948 


Ex.4, n = 943, H is given by T. Fushiki 


5 


1.147849c+04 1138.22 5.7e-07 


1.139629c+04 37.40 5.3e-15 1549 


1. 139265e+04 40.40 5.1e-15 1670 


20 


5.219117c+03 953.21 2.7e-07 


5.195025c+03 17.88 5.8e-15 678 


5. 194281o+03 28.15 5.8o-15 1080 


50 


3.718507c+03 585.96 2.0c-09 


3.711930c+03 13.16 6.6c-15 406 


3.711751e+03 15.70 6.7c-15 488 


100 


3.507128c+03 388.21 3.3c-07 


3.502504c+03 11.50 7.3c-15 297 


3.502538e+03 12.75 7.2c-15 324 


150 


3.505351c+03 359.97 3.1e-07 


3.500667c+03 12.04 7.7e-15 247 


3. 500648o+03 14.93 7.8o-15 299 


200 


3.505351c+03 365.10 3.2e-07 


3. 500655o+03 16.25 7.8e-15 277 


3.500676e+03 16.72 8.0c-15 291 


250 


3.505351c+03 361.32 3.2c-07 


3. 500723o+03 16.68 7.9c-15 251 


3.500700e+03 17.46 8.3c-15 267 


Ex.5, Lymph, n — 587, H is created by Example 2 in [24] 


2 


7.267954e+04 3123.45 4.0c-07 


7. 261145o+04 3.41 3.6o-15 384 


7. 270611e+04 9.70 3.8o-15 1119 


5 


3.827079c+04 2433.43 9.1c-07 


3.821899e+04 3.37 4.1e-15 355 


3.821942e+04 11.41 3.8c-15 1244 


20 


1.225500e+04 1995.87 l.le-07 


1.221864c+04 2.39 4.6o-15 227 


1.222005c+04 6.70 4.9c-15 652 


50 


4.712212c+03 1358.60 7.3c-08 


4.703913c+03 6.14 5.2c-15 496 


4.705611e+03 7.49 5.6c-15 562 


100 


2.281151c+03 640.71 6.9c-08 


2.279562c+03 9.63 5.4e-15 574 


2.279855c+03 10.64 6.1c-15 642 


150 


1.926714c+03 357.12 1.4e-08 


1.926830c+03 12.02 5.5o-15 557 


1. 926841o+03 14.29 5.8o-15 666 


200 


1.906004c+03 73.82 9.4e-07 


1.906487o+03 11.45 5.9o-15 447 


1.906900c+03 19.03 6.2c-15 742 


Ex.6, ER, ?i = 692, H is created by Example 2 in [24] 


2 


8.830494c+04 7456.08 5.7c-07 


8.821432c+04 5.16 4.1e-15 388 


8.822324e+04 19.33 4.1c-15 1512 


5 


4.392451c+04 3121.60 3.0e-07 


4.385944c+04 9.42 4.6c-15 677 


4. 385982o+04 13.76 4.4o-15 1002 


20 


1.283775c+04 5027.68 8.5e-07 


1.281657c+04 4.69 5.0c-15 300 


1.281796e+04 7.94 5.2c-15 521 


50 


5.232705c+03 1894.25 8.1c-08 


5.231603c+03 6.68 5.3e-15 349 


5.226695c+03 10.30 6.3c-15 544 


100 


2. 921895e+03 1329.12 6.0c-08 


2.919370c+03 18.26 5.6c-15 717 


2.919529c+03 18.98 6.1c-15 744 


150 


2.570773c+03 375.28 3.1o-08 


2. 570897o+03 16.95 6.0c-15 503 


2.570835c+03 19.29 6.2e-15 569 


200 


2.538172c+03 125.18 7.0o-09 


2.538801c+03 13.67 5.9o-15 334 


2.539434e+03 26.39 6.5o-15 644 


Ex.7, Arabidopsis, n — 834, H is created by Example 2 in [24] 


5 


4.588307e+04 8878.51 3.0c-07 


4.587489e+04 7.76 4.9e-15 379 


4.558236c+04 24.60 4.6e-15 1263 


20 


1.224684c+04 3800.10 1.2c-07 


1.224579c+04 4.12 5.3o-15 181 


1. 224775o+04 12.51 5.7o-15 577 


50 


5. 420976O+03 4226.01 1.3e-07 


5.412337c+03 10.63 5.7c-15 391 


5.413242c+03 17.45 6.6c-15 639 


100 


3.721807e+03 2109.99 6.6c-08 


3.719793c+03 18.28 5.7c-15 513 


3.719806e+03 19.88 5.9c-15 554 


150 


3.504307c+03 616.24 1.7c-08 


3.504437c+03 25.07 6.0o-15 550 


3.504858e+03 33.63 6.3c-15 733 


200 


3.487599c+03 172.79 1.9c-08 


3.488259e+03 26.80 6.2c-15 491 


3.489672e+03 39.35 7.1c-15 713 


250 


3.487634c+03 51.87 l.lo-07 


3.488184c+03 22.29 6.2o-15 344 


3.489718e+03 37.69 7.4c-15 582 


Ex.8, Leukemia, n — 1255, H is created by Example 2 in [24] 


5 


7.147916c+04 48580.58 5.1c-07 


7.133061e+04 16.36 5.5c-15 352 


7.133147e+04 69.15 5.5c-15 1572 


20 


2.202680e+04 22292.32 1.5c-07 


2.199789c+04 11.11 6.2c-15 221 


2.200121c+04 48.35 6.6c-15 1026 


50 


8.846229c+03 14071.77 7.6e-08 


8.835228c+03 18.44 7.3e-15 332 


8. 837748o+03 39.10 8.5o-15 703 


100 


6.245291c+03 6669.82 1.5c-07 


6.242296c+03 31.18 7.3c-15 451 


6. 242565o+03 32.62 7.3c-15 470 


150 


5.955934c+03 2330.50 1.9c-08 


5.956206c+03 38.50 7.7c-15 441 


5.958603c+03 73.59 9.0c-15 884 


200 


5.913864c+03 498.85 6.8c-09 


5.914920c+03 59.20 7.6e-15 611 


5.917427c+03 102.76 l.lc-14 1089 


250 


5. 911691e+03 117.54 6.7e-08 


5.912737e+03 34.54 7.8e-15 315 


5. 915785o+03 100.63 l.lc-14 916 


300 


5.911691c+03 216.05 3.6e-08 


5.912814c+03 60.74 8.0c-15 508 


5.915812c+03 108.02 l.lc-14 898 


Ex.9, Hereditary be, n — 1869, H is created by Example 2 in [24] 


5 


9.399313e+04 83138.79 2.4c-07 


9.383631e+04 29.41 7.1c-15 276 


9.383799c+04 172.38 7.4o-15 1725 


20 


1.381214c+04 50612.08 9.0o-07 


1.379194e+04 17.56 7.9e-15 145 


1.379812c+04 66.39 8.4c-15 622 


50 


1.069274c+04 51791.85 1.3e-07 


1. 068460o+04 51.18 7.9e-15 385 


1. 068499o+04 57.49 8.1o-15 437 


100 


1. 030237o+04 23339.73 1.2c-07 


1.028312c+04 62.27 7.9c-15 409 


1.028567c+04 115.20 8.6c-15 785 


150 


1. 022679O+04 5226.98 1.2c-07 


1.022771c+04 78.29 8.1c-15 475 


1.023226c+04 117.95 9.7c-15 717 


200 


1.022183c+04 4597.12 4.6e-07 


1.022377c+04 78.44 7.9c-15 387 


1. 022821o+04 180.81 1.0c-14 896 


250 


1.022195c+04 334.39 4.4e-09 


1.022307c+04 79.57 8.4c-15 357 


1.022829c+04 227.50 1.0o-14 1059 


300 


1.022195c+04 312.65 2.5e-09 


1.022380c+04 79.41 8.3c-15 323 


1.022891c+04 193.14 1.3o-14 781 


400 


1.022195c+04 315.47 1.5c-07 


1.022367c+04 116.95 8.8c-15 385 


1.022827c+04 336.21 1.3c-14 1135 
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Tabic 6.3: Numerical results of the nearest low-rank correlation matrix problem with equality requirements 





PenCorr 


AFBB 


r 


residual 








residual 


time 






nfgc 










Ex.10, n — 


943, H = 1 










10 


3.532719c+02 


1082.09 


1.95c-08 


1.55c-07 


3.529072c+02 


73.07 


5.67c-15 


1.73c-08 


2678 


20 


3.026229c+02 


312.73 


1.77c-09 


1.23c-08 


3.025621c+02 


19.56 


5.74c-15 


8.82c-09 


677 


50 


2.840583o+02 


93.80 


3.69c-08 


8.680-O8 


2.840568c+02 


15.07 


6.34c-15 


5.12c-08 


446 


100 


2.827108c+02 


36.28 


8.75c-08 


6.56c-08 


2.827154c+02 


16.92 


6.99c-15 


5.78c-09 


398 


125 


2.827102c+02 


25.24 


9.28c-09 


5.67c-09 


2.827151c+02 


18.15 


6.67c-15 


1.27c-08 


358 


150 


2.827102c+02 


23.66 


9.28c-09 


5.67c-09 


2.827136c+02 


18.10 


6.75c-15 


1.05c-08 


356 


200 


2.827102c+02 


23.38 


9.28o-09 


5.67c-09 


2.827172c+02 


23.97 


6.81e-15 


I.680-O8 


379 


250 


2.827102c+02 


23.52 


9.28o-09 


5.67c-09 


2.827161c+02 


23.64 


6.97o-15 


2.08c-08 


355 








Ex.10, 


n = 943, H 


given by T. Fushiki 








10 


8.305820c+03 


5292.66 


2.99c-08 


4.69c-07 


8.237722c+03 


200.93 


5.67o-15 


1.20c-08 


6221 


20 


5.725304c+03 


3261.71 


3.62G-09 


2.03c-08 


5.692092c+03 


94.22 


5.71o-15 


1.08c-08 


2810 


50 


4.170549c+03 


1540.47 


1.51c-08 


1.38c-08 


4.164945c+03 


63.46 


6.51c-15 


3.20c-09 


1658 


100 


3.939418c+03 


1012.39 


1.47G-08 


9.32c-09 


3.935732c+03 


54.98 


7.36o-15 


1.05c-08 


1189 


125 


3.936385c+03 


982.11 


7.68c-09 


7.01c-09 


3.932549c+03 


56.54 


7.36c-15 


8.38c-08 


1103 


150 


3.936385c+03 


981.92 


1.74c-08 


1.15c-08 


3.932716c+03 


61.93 


7.54c-15 


1.49c-08 


1128 


200 


3.936385c+03 


905.50 


1.52c-08 


1.04c-08 


3.932755c+03 


60.27 


7.31c-15 


2.13c-08 


1154 


250 


3.936385c+03 


917.70 


1.66c-08 


l.lOc-08 


3.932879c+03 


67.10 


8.05o-15 


1.94G-08 


1148 



where 

q = f Qij, (hi) or G B e ; , R . = f 1, or G B e : 

lJ \ 0, otherwise, e 13 \ 0, otherwise. 

Starting from the initial point C pca or C vca . Ao = 0, /io > 0, the procedure of the augmented Lagrangian 
method is as follows: 

{Vfc+i :w argmin v e R r * „ L Mit ( V. A k ) . s.t. ||V^|| 2 = 1, i = 1, • • • , n, 
A k +i := Ak - Mfctfe (V?V k - Q fc ), (6.6) 
fi k +i ■= 10/ifc. 

Note that the subproblem is in nature a low-rank nearest correlation matrix problem. We can inexactly solve 
it by the AFBB method. Specifically, for the fc-th subproblem, the main parameters of AFBB is set to be 

e k = max{0.1e fc _i, 1CT 5 }, e Xtk = max{0.1e x , fc _i, 10" 5 }, e ftk = max{0.1e/ >fc _i, 10" 8 }, Maxltcr fe = 2000, 

where eo = 10 _1 ,e2;,o = 10~ 3 ,e/. o = 10 -5 . Denote by v k = Y^(i j)<£B c l^i^fcJ ~ I tne constraint violation of 
V^Vj — qij = 0, <E B e corresponding to V k , we terminate the procedure (6.6) when v k +\ < 3 x 10 -8 or 

Wk+i -vh\ < 10- 8 . 

Below we consider a test instance of problem (6.3). 

Ex.10: The matrices C, H are the same as the ones in Ex.4. The index set B e consists of min{n e ,n — i} 
randomly generated integers from {1, ■ • • ,n} with n e = 3. We set = for G B e . 

The numerical results are presented in Table 6.3. In this table, "const." represents the constraint violation 
of V?Vj - q tj = 0, (i, j) G B e . From this table, we can see the AFBB method performs significantly better than 
PenCorr in terms of CPU time and residual quality, except for the large r corresponding to H = 1. It is worth 
noting that for the case when H ^ 1, AFBB exhibites more than 10-fold improvement over PenCorr in terms 
of CPU time, and at the same time it can return a solution with lower residual. 

6.3 Kohn-Sham energy minimization 

We test in this subsection a class of nonlinear eigenvalue problems known as Kohn-Sham (KS) equations, 
which arise in electronic structure calculations. The original KS equations or KS energy minimization problem 
is a continuous nonlinear problem. To solve the problems numerically, we turn them into finite-dimensional 
problems. Let the unitary matrix X G <C nxp be the approximation of the electronic wave functions of p occupied 
states, where n is the spatial degrees of freedom. Using the planewave basis, we define the finite-dimensional 
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approximation to the total energy functional as follows: 

StotalW = tt(x* (l£ + V ion ^Xj + \p T L^ P + P T e xc {p) + ^Ewald + £rcp, (6.7) 

where L is a finite-dimensional representation of the Laplacian operator in the planewave basis, V[ on denotes 
the ionic pseudopotcntials sampled on the suitably chosen Cartesian grid, p = diag(JTAT*) represents the charge 
density, the matrix means the pseudo-inverse of L, and e xc (p) is the exchange-correlation energy per particle 
in a uniform electron gas of density p. The last two terms in (6.7) are constant shifts of the total energy (see 
[49] for the details). Consequently, the discretized KS energy minimization can be formulated as 

min Et ota i(X), s.t. X*X = I p . (6.8) 

For the above problem, the KKT conditions; i.e., the KS equations, are 

H(X)X - XA P = 0, X*X = I P . (6.9) 

Here, H(X) = + Vi on + Di&g(L 1< p) + Dia,g(p xc (p)), p xc {p) = dp xc (p) / dp and A p is ap-by-p symmetric matrix 
of Lagrangian multipliers. 

Generically, it is sophisticated to compute the objective function and its gradient in (6.8). Thus, we use the 
Matlab Toolbox KSSOLV [49] which is tailored for easily developing new algorithms for solving the KS equations 
to do so. In order to show the efficiency of our AFBB method, we compared it with the self-consistent field 
(SCF) iteration which is currently the most widely used approach for the KS equations, this SCF iteration is 
provided in KSSOLV. We also compared AFBB with OptM [48] for KS problem (6.8) but only the perfomance 
of AFBB is reported because they performed similarly. The maximal iteration of SCF was set to be 200 while 
the other parameters were set to be their default values in KSSOLV. For the sake of fairness, we improved the 
stopping accuracy of AFBB; i.e., resetting e = 10~ 6 , e x = 10 -10 , e/ = 10~ 14 , Maxlter = 1000, to obtain a higher 
quality solution. The termination rules are not directly comparable due to the different formulations of the 
problem used by SCF and AFBB. Specifically, SCF focuses on KS equations (6.9) and needs to solve a series 
of linear eigenvalue problems, while AFBB minimizes the total energy directly. However, as shown later by the 
numerical results in Table 6.4, we see that on average the chosen stopping criteria for AFBB are tighter than 
those of SCF in terms of the residual \\HX — H(X*HX)\\p. For each problem, we ran the two algorithms 10 
times from different random initial points generated by the function "genXO" provided in KSSOLV. 

A summary of the numerical results on 18 standard testing problems is reported in Table 6.4. In this table, 
"-Etotal" denotes the total energy function value, "iter" the total number of iterations, "resi" , "ave.feasi" and 
"ave.time" the residual \\HX — X(X* HX)\\p, the averaged violation of the constraint X* X = I p and the 
averaged CPU time in seconds, respectively. We use "ave.err" to denote the averaged relative errors between 
the averaged total energy Z\ given by AFBB or the averaged total energy given by SCF and the minimal 
of zi and Z2, which is computed by £i_Sj5jjiiM or ^-minjz^} p th t u th t th AFBB 

method is considerably competitive and it can always take less CPU time than SCF to find the better solutions 
in terms of the total energy and the residual especially for the large molecules. In particular, for the largest 
problems "pentacene" in our test, AFBB is not only avcragely about three times faster than SCF, but also 
returns a better solution with smaller total energy and residual. 



6.4 Maximization of sums of heterogeneous quadratic functions on the Stiefel manifold from 
statistics 

In [5], Balogh et al. gave a test problem with known global optimal solutions as follows: 

p 

min J^XLAiXu), s.t. X T X = I p , (6.10) 

i—l 

where for 1 < i < p, Aj = diag(n(i — 1) + 1, • • • , n(i — 1) + i — 1, n(i — 1) + i + 1, • • • , ni) and U < 0. This 
is a special maximization of sums of heterogeneous quadratic functions on the Stiefel manifold from statistics 
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Table 6.4: Numerical results of the Kohn-Sham energy minimization 



solvcr| -^total (min, mean, max) rcsi (min, mean, max) |itcr(min, mean, max)|avc.l'easi| avc.crr |ave.time 


alanine, n — 12671, p — IS 


AFBB 


(-6.11619212e+01, -6.11619212c+01, -6.11619212c+01) 


(3.3o-07, 5.0o-07, 6.5c-07) 


(71, 80.7, 109) 


4.2c-14 


O.Oc+00 


117.2 


SCF 


(-6.11619212c+01, -6.11619212e+01, -6.11619212c+01) 


(l.lc-06, 2.0o-06, 3.6c-06) 


(17, 53.5, 200) 


2.0c-14 


5.0c-14 


179.9 


al, n = 16879, p = 12 


AFBB 


(-1. 58038176o+01, -1.58038070c+01. -1.58037911c+01) 


(3.7o-07, 2.I0-O6, 7.8o-06) 


(729, 944.7, 1000) 


4.0c-15 


O.Oc+00 


1127.9 


SCF 


(-1. 58009286o+01, -1.57903910c+01, -1.57753330c+01) 


(2.6c-03, 7.5o-03, 1.8c-02) 


(200, 200, 200) 


5.3c-15 


8.5o-04 


1409.3 


benzene, n — 8407, p — 15 


AFBB 


(-3. 72257514O+01, -3.72257514o+01, -3.72257514c+01) 


(2.1c-07, 3.5o-07, 4.8c-07) 


(66, 71, 82) 


3.4o-14 


O.Oc+00 


55.6 


SCF 


(-3.72257514e+01, -3.72257514c+01, -3.72257514c+01) 


(7.6c-07, 1.5o-06, 2.7c-06) 


(18, 57, 200) 


l.Oc-13 


l.lc-13 


96.2 


c2h6, n = 2103, p = 7 


AFBB 


(-1.44204913C+01, -1.44204913c+01. -1 .44204913c+01) 


(2.1o-07, 3.4o-07, 5.4o-07) 


(52, 55.2, 60) 


1.3c-14 


O.Oc+00 


7.8 


SCF 


(-1.44204913O+01, -1.44204913o+01, -1.44204913c+01) 


(l.lc-06, 1.56o-06, 2.3o-06) 


(14, 22.2, 81) 


1.6c-14 


l.Oc-13 


8.3 


cl2h26, n = 5709, p = 37 


AFBB 


(-8.15360919C+01, -8.15360919c+01. -8.15360919e+01) 


(1.7o-07, 5.8o-07, 8.6O-07) 


(70, 80.7, 94) 


6.2c-14 


O.Oc+00 


128.2 


SCF 


(-8. 15360919o+01, -8.15360919o+01, -8.15360919c+01) 


(1.7o-06, 3.5o-06, 4.2c-06) 


(21, 37.4, 80) 


5.6c-14 


1.5c-13 


143.4 


co2, n = 2103, p = 8 


AFBB 


(-3.51243958c+01, -3.51243958e+01, -3.51243958o+01) 


(7.0c-08, 2.1o-07, 3.2c-07) 


(57, 62.6, 65) 


1.8c-14 


O.Oc+00 


7.6 


SCF 


(-3.51243958c+01, -3.51243958c+01, -3.51243958c+01) 


(l.lc-06, 1.7c-06, 2.60-O6) 


(13, 15.8, 27) 


6.30-15 


5.6c-14 


7.0 


ctubc661, n = 12599, p = 48 


AFBB 


(-1. 34638432o+02, -1.34638432o+02, -1.34638432c+02) 


(3.1c-07, 6.3c-07, 9.1c-07) 


(68, 83.5, 98) 


4.0c-14 


O.Oc+00 


365.2 


SCF 


(-1.34638432O+02, -1.34638432c+02, -1.34638432c+02) 


(1.5c-06, 3.0c-06, 5.8c-06) 


(19, 109.6, 200) 


4.0o-14 


1.7c-ll 


953.4 


glutamine, n — 16517, p — 29 


AFBB 


(-9. 18394252O+01, -9.18394252c+01. -9. 18394252c+01) 


(4.2o-07, 7.2c-07, 9.1o-07) 


(86, 98.4, 123) 


8.0c-14 


O.Oc+00 


370.7 


SCF 


(-9. 18394252o+01, -9.18394252c+01, -9.18394252c+01) 


(9.6c-07, 2.5c-06, 3.1c-06) 


(22, 54.6, 200) 


2.9o-14 


3.2c-14 


444.8 


graphenel6, n — 3071, p — 37 


AFBB 


(-9. 40462175o+01, -9.40462175c+01, -9.40462175o+01) 


(2.2o-07, 4.8c-07, 6.2o-07) 


(198, 254.8, 370) 


1.20-14 


O.Oc+00 


196.9 


SCF 


(-9. 40334965O+01, -9.39758903c+01. -9.38398259c+01) 


(2.1o-04, 1.2c-02, 6.2o-02) 


(200, 200, 200) 


1.7c-14 


7.5c-04 


1042.6 


graphcnc30, n = 12279, p = 67 


AFBB 


(-1. 73595105o+02, -1.73595105c+02, -1.73595105c+02) 


(5.5c-07, 7.7c-07, 8.8c-07) 


(273, 325.1, 386) 


2.1o-14 


O.Oc+00 


2105.2 


SCF 


(-1. 73588629o+02, -1.73530599c+02, -1.73263182c+02) 


(3.1c-04, 9.5c-03, 3.8c-02) 


(200, 200, 200) 


3.4c-14 


3.7c-04 


8617.8 


h2o, n = 2103, p = 4 


AFBB 


(-1.64405072C+01, -1.64405072c+01. -1 .64405072c+01) 


(2.6c-08, 9.2c-08, 1.8o-07) 


(57, 65.8, 78) 


2.3c-14 


O.Oc+00 


5.7 


SCF 


(-1. 64405072O+01, -1.64405072c+01, -1 .64405072c+01) 


(8.5c-07, 1.2o-06, 1.7c-06) 


(12, 16.6, 26) 


4.5c-15 


7.6c-14 


4.7 


hnco, n = 2103, p = 8 


AFBB 


(-2. 86346644o+01, -2.86346644c+01. -2.86346644c+01) 


(2.4c-07, 4.0c-07, 4.6o-07) 


(98, 110.8, 122) 


2.8c-14 


1.5o-14 


14.1 


SCF 


(-2.86346644C+01, -2.86346644c+01. -2.86346644c+01) 


(l.lc-06, 1.2O-06, 1.2o-06) 


(15, 15, 15) 


6.1c-15 


O.Oc+00 


10.0 


nic, n — 251, p — 7 


AFBB 


(-2. 35435300O+01, -2.35435300c+01, -2.35435300c+01) 


(l.lc-06, 1.4c-06, 1.6c-06) 


(47, 51, 59) 


1.2c-14 


5.4c-13 


3.3 


SCF 


(-2. 35435300o+01, -2.35435300c+01, -2.35435300c+01) 


(l.lc-06, 1.2c-06, 1.3o-06) 


(14, 14, 14) 


3.1c-15 


O.Oc+00 


3.9 


pentacene, n — 44791, p — 51 


AFBB 


(-1. 31890295O+02, -1.31890295e+02, -1 .31890295c+02) 


(2.1c-07, 6.1c-07, 8.6c-07) 


(95, 110.6, 126) 


1.5c-14 


O.Oc+00 


1723.2 


SCF 


(-1. 31890295o+02, -1.31890295c+02, -1.31890295c+02) 


(9.5c-07, 6.9c-06, 3.0c-05) 


(20, 129.8, 200) 


5.2c-14 


4.5o-ll 


4978.8 


ptnio, n = 4069, p = 43 


AFBB 


(-2.26788843O+02, -2.26788843c+02. -2.26788843c+02) 


(2.5o-06, 3.6c-06, 4.O0-O6) 


(391, 451.7, 515) 


1.3c-14 


O.Oc+00 


460.7 


SCF 


(-2. 26788843O+02, -2.26788842c+02, -2.26788840c+02) 


(l.Oc-06, 3.7c-06, 7.9c-06) 


(200, 200, 200) 


1.9c-14 


1.3c-09 


912.1 


qdot, n = 2103, p = 8 


AFBB 


(2.76998001c+01, 2.76998025c+01, 2.76998154c+01) 


(4.2c-06, 2.1c-04, 1.6c-03) 


(1000, 1000, 1000) 


3.4o-15 


O.Oc+00 


76.5 


SCF 


(2. 76999754o+01, 2.77043674o+01, 2.77127111o+01) 


(4.5o-03, 5.0c-02, 2.I0-OI) 


(200, 200, 200) 


5.2c-15 


1.7c-04 


99.7 


si2h4, n = 2103, p = 6 


AFBB 


(-6. 30097505o+00, -6.30097505e+00, -6.30097505c+00) 


(2.8c-07, 4.3c-07, 5.7c-07) 


(73, 80.2, 99) 


1.6c-14 


O.Oc+00 


9.2 


SCF 


(-6. 30097505o+00, -6.30097505o+00, -6.30097505c+00) 


(9.2c-07, 1.3c-06, 1.6c-06) 


(15, 19.3, 44) 


1.5c-14 


1.7c-13 


8.1 


sih4, n = 2103, p = 4 


AFBB 


(-6.17692799C+00, -6. 17692799c+00. -6.17692799c+00) 


(4.1c-08, 7.3c-08, 1.2o-07) 


(50, 53.4, 58) 


9.0c-15 


O.Oc+00 


6.1 


SCF 


(-6. 17692799o+00, -6.17692799c+00, -6.17692799c+00) 


(7.4c-07, l.Oc-06, 1.6c-06) 


( 9, 11, 13) 


4.50-15 


2.3c-13 


4.9 



24 



Table 6.5: Numerical results of AFBBD 1/2 and AFBB£> 1/4 for problem (6.10) 



random I 






AFBBD 1/2 


AFBBD 1/4 


ave. save. ratio 


P 


/* 


ave.obj 




avc.nfc 


avc.crr 


ave.obj 


avc.nfc 


ave.err 


% 


1 


-3. 56639031o-01 


-3.56638763c-01 


351.0 


7.51c-07 


-3.56638763c-01 


351.0 


7.51c-07 


0.0 


2 


-6.62874870c-01 


-6.62874470c-01 


461.7 


6.03o-07 


-6.62874407c-01 


460.4 


6.98c-07 


-0.3 


5 


-2.58879404c+00 


-2. 58879214o+00 


604.6 


7.35c-07 


-2.58879302c+00 


496.2 


3.93c-07 


-17.9 


10 


-4.85391785c+00 


-4.85391320c- 


fOO 


728.0 


9.58c-07 


-4. 85391578o+00 


577.8 


4.26o-07 


-20.6 


20 


-1.05472851c+01 


-1.05472762c- 


fOl 


834.3 


8.42c-07 


-1. 05472797o+01 


642.0 


5.15c-07 


-23.0 


40 


-2.02220608c+01 


-2.02220450c- 


fOl 


916.8 


7.80c-07 


-2. 02220503o+01 


655.4 


5.21o-07 


-28.5 


60 


-2.89590379c+01 


-2.89590046c- 


fOl 


939.7 


1.15c-06 


-2. 89590164o+01 


661.6 


7.41c-07 


-29.6 


80 


-4.33419641c+01 


-4.33419332c- 


fOl 


975.2 


7.14c-07 


-4. 33419460O+01 


687.8 


4.18o-07 


-29.5 


100 


-4.91784679c+01 


-4.91784228c- 


fOl 


1010.0 


9.17c-07 


-4. 91784501o+01 


733.7 


3.63c-07 


-27.4 










U 


-1, 4 = 1,- 










1 


-1.00000000e+00 


-9.99999777c-01 


!J! . 1 


2.23c-07 


-9.99999777c-01 


300.4 


2.23c-07 


0.0 


2 


-2.00000000e+00 


-1.99999916c- 


fOO 


388.6 


4.21c-07 


-1.99999952c+00 


385.1 


2.41c-07 


-0.9 


5 


-5.00000000c+00 


-4.99999753c- 


fOO 


579.4 


4.94o-07 


-4. 99999880o+00 


506.1 


2.39o-07 


-12.7 


10 


-l.OOOOOOOOc+Ol 


-9.99999261c- 


fOO 


712.1 


7.39c-07 


-9.99999666c+00 


559.7 


3.34c-07 


-21.4 


20 


-2.00000000c+01 


-1.99999803c- 


fOl 


810.1 


9.86c-07 


-1. 99999932o+01 


602.2 


3.41c-07 


-25.7 


40 


-4.00000000c+01 


-3.99999658c- 


fOl 


861.7 


8.55c-07 


-3. 99999819o+01 


627.4 


4.52c-07 


-27.2 


60 


-6.00000000c+01 


-5.99992796c- 


fOl 


971.7 


1.20o-05 


-5. 99999791o+01 


669.3 


3.48c-07 


-31.1 


80 


-8.00000000c+01 


-7.99999149c- 


fOl 


974.0 


1.06c-06 


-7. 99999670o+01 


683.5 


4.13c-07 


-29.8 


100 


-1.00000000e+02 


-9.99999086c- 


fOl 


997.6 


9.14c-07 


-9. 99999680o+01 


688.0 


3.20c-07 


-31.0 



(it was also considered in [7]). By Proposition 1 in [5], we know that {(±ei,±e2, • • • ,±e p ) : ±e, <E {ej, — ei}} 
is the set of minimum points of problem (6.10) and JZj—i k ' 1S the optimal function value. It was pointed out 
in [5] that there were no efficient numerical methods to solve problem (6.10) yet. Nevertheless, our numerical 
tests show that the AFBB method works well. 

In this experiment, we reset e = 10~ 6 ,e x = 10 _6 ,e/ = 10 -10 and fixed n = 4000. For 1 < p < n and 
1 < i < p 1 wc generated U in two ways, one is that ^ is uniformly distributed in [0, 1], the other is that li = — 1. 
We ran our AFBB methods 50 times from different random initial points for each test. 

Firstly, we investigate the effect of using different descent directions. We call AFBB methods using the 
descent directions D 1 / 2 and D 1 / 4: as AFBBZ^/2 and AFBBDj/4, respectively. The numerical results are shown 
in Table 6.5. In this table, /* denotes the optimal function value, "ave.obj", "ave.nfe" and "ave.err" denote the 
average function value given by each method, the average total number of function evaluations and the average 
relative error between the function value given by each method and /*, respectively. Wc use "ave. save. ratio" 
to stand for the average saved ratio of AFBB£>i/ 4 which is computed as 100(nfe (9=1 / 4 — nfe (3= i/ 2 )/nfe (!)= i/2. 
From this table, we know that the two methods can always find nearly global solutions in acceptable iterations. 
Averagcly, AFBBZ^/4 can always find a better solution with smaller function value about 25% faster than 
AFBBZ?!/ 2 for most of the tests. This may be due to the fact that Di/<± is the steepest descent direction 
corresponding to the Euclidean metric] i.e., 

D 1/4 = argmin - (G, D)/\\D\\ F s.t.DeTx- 

Secondly, we consider the effect of using different g(r)'s in forming J(r). We tested two choices: <?i(t) = t/2 
which is the default in update scheme (3.12) and (?2(t) = \ re ~ r ■ We call AFBB methods using gi(r) and 32(1") as 
AFBBgl and AFBBg2, respectively. From Table 6.6, we see that AFBBg2 improves the performance of AFBBgl 
by 15% or more for most tests in terms of the average total number of function evaluations. Meanwhile, it can 
always return a soultion with smaller function value. Nevertheless, it remains under investigation how to seek 
a better g(r). 

At the end of this subsection, we remark that X T = G T X may happen in the nearest low-rank correlation 
matrix problem and the Kohn-Sham energy minimization. In this case, the term X T D p vanishes and g{r) will 
not play a role in forming J(t). 
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Table 6.6: Numerical results of AFBBgl and AFBBg2 for problem (6.10) 



random I 






AFBBgl 


AFBBg2 


ave. save. ratio 


P 


/* 


ave.obj 


avc.nfc 


avc.crr 


ave.obj 


avc.nfc 


ave.err 


% 


1 


-5.60516861c-01 


-5. 60516557o-01 


315.1 


5.43c-07 


-5.60516557c-01 


315.1 


5.43c-07 


0.0 


2 


-6.75531389c-01 


-6. 75530921o-01 


422.2 


6.93c-07 


-6.75530944c-01 


436.2 


6.59c-07 


3.3 


5 


-9.62157491c-01 


-9. 62154540o-01 


651.5 


3.07c-06 


-9.62157125c-01 


543.5 


3.81c-07 


-16.6 


10 


-4.21478020e+00 


-4. 21477686o+00 


762.1 


7.93c-07 


-4. 21477730o+00 


628.7 


6.89o-07 


-17.5 


20 


-9.50648759c+00 


-9.50647798c+00 


891.4 


l.Olc-06 


-9. 50648411o+00 


799.4 


3.66c-07 


-10.3 


40 


-2.13247908c+01 


-2. 13247747o+01 


972.8 


7.52c-07 


-2. 13247857o+01 


775.9 


2.36c-07 


-20.2 


60 


-3.09438372C+01 


-3.09438150C+01 


859.4 


7.16c-07 


-3. 09438274o+01 


769.6 


3.17c-07 


-10.4 


80 


-3.66477439c+01 


-3. 66477202o+01 


947.1 


6.47o-07 


-3. 66477277o+01 


809.3 


4.42c-07 


-14.5 


100 


-5.01228037c+01 


-5. 01227673O+01 


912.0 


7.28c-07 


-5.01227835c+01 


824.3 


4.03c-07 


-9.6 








U 


-1, i = 1, • 










1 


-1.00000000e+00 


-9. 99999782o-01 


i i ( . 1 


2.18c-07 


-9. 99999782o-01 


310.4 


2.18c-07 


0.0 


2 


-2.00000000e+00 


-1. 99999943o+00 


380.7 


2.86c-07 


-1. 99999951o+00 


406.9 


2.44c-07 


6.9 


5 


-5.00000000c+00 


-4. 99999783o+00 


561.7 


4.35c-07 


-4.99999852c+00 


500.9 


2.95o-07 


-10.8 


10 


-l.OOOOOOOOc+Ol 


-9.99999425c+00 


688.8 


5.75c-07 


-9. 99999678o+00 


571.1 


3.22c-07 


-17.1 


20 


-2.00000000c+01 


-1. 99999869o+01 


799.2 


6.53c-07 


-1.99999936c+01 


662.3 


3.20c-07 


-17.1 


40 


-4.00000000c+01 


-3. 99999675o+01 


839.8 


8.12c-07 


-3. 99999872o+01 


695.7 


3.19c-07 


-17.2 


60 


-6.00000000c+01 


-5. 99999363o+01 


937.4 


1.06c-06 


-5. 99999814o+01 


749.5 


3.09o-07 


-20.0 


80 


-8.00000000c+01 


-7. 99999348o+01 


931.2 


8.15o-07 


-7. 99999419o+01 


748.6 


7.26c-07 


-19.6 


100 


-1.00000000e+02 


-9. 99999263o+01 


929.0 


7.37c-07 


-9. 99999653o+01 


825.0 


3.47c-07 


-11.2 



7 Conclusion 

In this paper, we have proposed a feasible method for optimization on the Stiefel manifold. Our main con- 
tributions are twofold. Firstly, we proposed a new framework of constraint preserving update schemes for 
optimization on the Stiefel manifold by decomposing each feasible point into the range space of X and the null 
space of X T . While this new framework can unify many existing schemes, we also investigated a new update 
scheme with low complexity. Note that our framework can be viewed as a retraction as well. Secondly, we 
proposed the adaptive feasible Barzilai-Borwein-like method and proved its global convergence. To our knowl- 
edge, this result is the first global convergence result for the feasible method with nonmonotone linesearch for 
optimization on the Stiefel manifold. Moreover, the corresponding extension to the generalized Stiefel manifold 
was also considered. 

We have tested our AFBB method on a variety of problems to illustrate its efficiency. Particularly, for the 
nearest low-rank correlation matrix problem, AFBB performs better than two state-of-the-art algorithms; for 
Kohn-Sham total energy minimization, its superiority is obvious especially for large molecules, and hence it is 
quite promising to use our AFBB for large-scale electronic structure calculations. Since our update scheme can 
be done by moving along any given tangent direction, we also explore the effect of different descent directions 
and <?(t)'s on the performance of AFBB. 

As our framework can unify several famous retractions, it is natural and interesting to argue which one can 
make the AFBB method find the global optimal solution with highest probability and at a fastest speed. One 
possible approach is to consider the subspace techniques. This remains under investigation. 
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